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END-COMPLEMENTARY POLYMERASE REACTION 
TECHNICAL FIELD 

5 The present invention relates generally to the field 

of recombinant DNA technology and, more particularly, to 
improved methods for producing amplified heterogeneous 
populations of polynucleotides from limited quantities of DNA 
or RNA or other nucleic acids. The invention provides 
10 compositions and methods for a chain reaction amplification of 
a target polynucleotide species using a thermostable 
polymerase or other suitable polynucleotide polymerase 
compatible with the method. 

15 BACKGROUND 

Selective amplification of polynucleotides 
represents a major research goal of molecular biology, with 
particular importance in diagnostic and forensic applications, 
as well as for general manipulations of genetic materials and 

20 laboratory reagents. 

The polymerase chain reaction (PGR) is a method by 
which a specific polynucleotide sequence can be amplified in 
vitro . PCR is an extremely powerful technique for amplifying 
specific polynucleotide sequences, including genomic DNA, 

25 single-stranded cDNA, and mRNA among others. As described in 
U.S. Patent Nos. 4,683,202, 4,683,195, and 4,800,159 {which 
are incorporated herein by reference) , PCR typically comprises 
treating separate complementary strands of a target nucleic 
acid with two oligonucleotide primers to form complementary 

3 0 primer extension products on both strands that act as 

templates for^ synthesizing copies of the desired nucleic acid 
sequences. By repeating the separation and synthesis steps in 
an automated system, essentially exponential duplication of 
the target sequences can be achieved. 

3 5 A number of variations of the basic PCR methodology 

have been described. U.S. Patent No. 5,066,584 discloses a 
method wherein single stranded DNA can be generated by the 
polymerase chain reaction using two oligonucleotide primers, 
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one present in a limiting concentration. U.S. Patent No. 
5,340,728 discloses an improved method for performing a nested 
polymerase chain reaction (PGR) amplification of a targeted 
piece of DNA, wherein by controlling the annealing times and 
5 concentration of both the outer and the inner set of primers 
according to the method disclosed, highly specific and 
efficient amplification of a targeted piece of DNA can be 
achieved without depletion or removal of the outer primers 
from the reaction mixture vessel. U.S. Patent No. 5,286,632 
10 discloses recombination PGR (RPCR) wherein PGR is used with at 
least two primer species to add double-stranded homologous 
ends to DNA such that the homologous ends undergo in vivo 
recombination following transfection of host cells. 

Horton et al. (1989) Gene 77: 61, discloses a method 
15 for making chimeric genes using PGR to generate overlapping 
homologous regions. In the Horton method, fragments of 
different genes that are to form the chimeric gene are 
generated in separate polymerase chain reactions. The primers 
used in these separate reactions are designed so that the 
20 ends of the different products of the separate reactions 
contain complementary sequences. When these separately 
produced PGR products are mixed, denatured and reannealed, the 
strands having matching sequences at their 3 '-ends overlap and 
act as primers for each other. Extension of this overlap by 
25 DNA polymerase produces a molecule in which the original 
sequences are spliced together to form the chimeric gene. 

Silver and Keerikatte (1989) J. Virol. 63j. 1924 
describe another variation of the standard PGR approach 
(which requires oligonucleotide primers complem^entary to both 
30 ends of the segment to be amplified) to allow amplification 
of DNA flanked on only one side by a region of known DNA 
sequence. This technique requires the presence of a known 
restriction site within the known DNA sequence and a similar 
site within the unknown flanking DNA sequence which is to be 
35 amplified. After restriction and recircularization, the 
recircularized fragment is restricted at an unique site 
between the two primers and the resulting linearized fragment 
is used as a template for PGR amplification. 



wo 96/33207 PCT/US96/05480 

3 

Triglia et al. (1988) Nucl. Acids Res. 16; 8186, 
describe an approach which requires the inversion of the 
s quence of interest by circularization and re-opening at a 
site distinct from the one of interest, and is called 
5 "inverted PGR." A fragment is first created in which two 
unknown sequences flank on either side a region of known DNA 
sequence. The fragment is then circularized and cleaved with 
an unique restriction endonuclease which only cuts within the 
known DNA sequence creating a new fragment containing all of 

10 the DNA of the original fragment but which is then inverted 
with regions of known sequence flanking the region of unknown 
sequence. This fragment is then utilized as a PGR substrate 
to amplify the unknown sequence. 

Vallette et al. (1989) Nucl. Acids Res. 17; 723 

15 disclose using PGR in a specific approach which involves using 
a supercoiled plasmid DNA as a template for PGR and a primer 
bearing a mutated sequence which is incorporated into the 
amplified product. Using this method, DNA sequences may be 
inserted only at the 5 '-end of the DNA molecule which one 

20 wishes to alter. Mole et al. (1989) Nucl. Acids Res. 17 : 

3319, used PGR to create deletions within existing expression 
plasmids. However, PGR was performed around the entire 
plasmid (containing the fragment to be deleted) from primers 
whose 5 '-ends defined the region to be deleted. Self -ligation 

25 of the PGR product recircularized the plasmid. 

U.S. Patent No. 5,279,952 discloses a method for 
using PGR to generate mutations (e.g., deletions) and chimeric 
genes by forming head-^to-tail concatemers of a known starting 
sequence and employing at least two PGR primers to amplify a 

30 DNA segment which is altered as compared to the known starting 
sequence. 

Jones and Howard (1990) BioTechniaues 8: 178, report 
a site-specific mutagenesis method using PGR, termed 
recombinant circle PGR (RGPGR) . In RGPGR, separate PGR 
35 amplifications (typically two) of a known polynucleotide 
generate products that, when combined, denatured, and 
annealed, form double-stranded DNA with discrete, cohesive 
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single-stranded ends designed so that they may anneal and form 

circles of DNA. 

Oliner et al. (1993) Nucl. Aci ds, Res. 21: 5192, 
report a method for engineering PGR products to contain 
5 teirminal sequences identical to sequences at the two ends of a 
linearized vector such that co-transf ection of the PGR product 
and linearized vector into a recombination-competent host cell 
results in formation of a covalently linked vector containing 
the PGR product, thus avoiding the need for in vitro ligation. 

10 In spite of such recent advances, including PGR and 

its various modifications noted above, there exists a need for 
improved methods of identifying and cloning polynucleotides, 
for accurate in vitro amplification of selected 
polynucleotides, and for facile assembly of polynucleotides 

15 from a mixture of component oligonucleotides or 

polynucleotides without necessitating the use of DNA ligase. 
In particular, there is a need for a PGR amplification method 
which can be performed with (1) only a single primer species, 
or (2) with multiple overlapping polynucleotide fragments (or 

20 oligonucleotides) in the absence of a conventional PGR primer, 
and which can result in formation of an amplified product 
which can be a concatemer and/ or which can be a covalently- 
closed circle. The present invention fulfills these and other 
needs • 

25 The references discussed herein are provided solely 

for their disclosure prior to the filing date of the present 
application. Nothing herein is to be construed as an 
admission that the inventors are not entitled to antedate such 
disclosure by virtue of prior invention. All publications 

3 0 cited are incorporated herein by reference. 

SUMMARY O F THE INVENTION 

A basis of the present invention is the use of 
polymerase-mediated chain extension, such as for example PGR, 
3 5 in comJDination with at least two polynucleotides having 

complementary ends which can anneal whereby at least one of 
said polynucleotides has a free 3'-hydroxyl capable of 
polynucleotide chain elongation by a DNA polymerase, such as a 
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thermostable polymerase (e.g., Thermus aauaticus (Taq) 
polymerase or Thermococcus litoralis (Vent™, New England 
Biolabs) polymerase or TthI polymerase (PerKin-Elm r) . 
Although the method can be practiced using PGR, in some 
5 embodiments either a single primer species or no primer 
whatsoever is required, and hence PGR is not a necessary 
component of the general method. 

In one embodiment, a target polynucleotide is 
contacted with a "bivalent primer" typically comprising an 

10 oligonucleotide having two regions of complementarity to the 
target polynucleotide: (1) a first portion which is in the 5' 
portion of the primer and which is substantially complementary 
to a sequence in the 3 ' portion of the sequence to be 
amplified (target sequence) in the target polynucleotide, and 

15 (2) a second portion which is in the 3' portion of the primer 
and which is substantially complementary to a sequence in the 
5' portion of the sequence to be amplified (target sequence) 
in the target polynucleotide. The contacting is performed 
under conditions suitable for hybridization of the bivalent 

20 primer to the target polynucleotide, most often following 
thermal denaturation of the target polynucleotide if it is 
initially present as a double-stranded form. The target 
polynucleotide may be substantially homogeneous or may be 
present in a mixture of polynucleotide species (e.g., in a 

25 genome, biological sample, or mixture of synthetic 

polynucleotides) . Subsequent or concomitant with the 
contacting of the bivalent primer to the target 
polynucleotide, a polynucleotide polymerase, such as a 
thermostable DNA polymerase, catalyzes, under suitable 

30 reaction conditions, polynucleotide synthesis (chain 

elongation) primed from the 3' -hydroxy 1 of the annealed 
bivalent primer to form a strand complementary to the target 
sequence, thereby forming a nascent complementary strand. 
Following completion of the nascent complementary strand 

35 spanning the target sequence, the target polynucleotide and 
the nascent strand are denatured, typically by elevation of 
temperature, and allowed to reanneal, typically by reduction 
of temperature, with another molecule of the bivalent primer 



wo 96/33207 PCT/US96/05480 



species or with a complem ntary strand of a target 
polynucleotide or an amplified copy thereof. The denatured 
nascent strand species following th first longation cycle 
will contain a copy of the target sequence and has a terminal 
5 repeat of its 5 '-terminal sequence at the 3' terminus, 

resulting from the bivalent primer, and wherein the terminal 
repeat is of sufficient length to support annealing under PGR 
conditions to an overlapping complementary strand in a head- 
to-tail arrangement (see. Fig. 1) . Following reannealing, the 

10 described polymerase elongation/denaturation/reannealing cycle 
is repeated from 1 to about 100 times as desired, resulting in 
formation of amplified product which comprises head-to-tail 
concatemers of the target sequence. The concatemers typically 
increase in length as the number of amplification cycles 

15 increase and as the amount of bivalent primer decreases. 
Following amplification forming concatameric head-to-tail 
repeats of the target sequence, the concatemer ( s ) can 
optionally be resolved, such as (1) by cleaving with a 
restriction endonuclease which cuts within (or at the termini 

20 of) the concatemeric unit(s), (2) by homologous recombination 
between concatemer units to form covalently closed circles, or 
(3) by cleavage with a restriction endonuclease followed by 
ligation with DMA ligase to form covalently closed circles 
and/or by direct transformation into host cells for in vivo 

25 ligation. 

Often, a target polynucleotide sequence which is 
amplified as described above will form amplification 
intermediates in the form of cyclized DNA or spiral DNA (see. 
Fig. 2), as a result of the 3' terminus of an overlapped 

30 nascent strand annealing to the 3' terminus of an overlapped 
complementary strand forming a cyclized structure similar to a 
gapped circle; the cyclized structure has a strand with an 
extendable 3'-hydroxyl which can be extended with a DNA 
polymerase substantially lacking exonuclease activity (e.g 

35 thermostable polymerase such as Vent(exo-)™ or Klenow 

fragment, etc.) in a rolling circle format whereby the leading 
terminus of the nascent strand continually displaces the 
lagging portion of the nascent strand (see, Fig. 2) producing 



a 
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a concatemeric single strand emanating from th rolling circle 
intermediate. Most oft n, such rolling circle intermediates 
will form under dilute conditions more favorable to 
intramolecular cyclization of overlapped strands. Once a 
5 rolling circle intermediate is established, the template need 
not be denatured in order to continue amplification of the 
target sequence as in conventional PGR, thus avoiding the 
necessity of multiple thermal cycles of PGR to denature 
template (and the resultant time loss needed for heating and 

10 cooling) . Often, however, the template is repeatedly 

denatured, annealed, and extended with polymerase in the 
presence of ribonucleotide or deoxyribonucleotides under 
suitable reaction conditions. 

Furthermore, whether the method generates a rolling 

15 circle intermediate or linear concatemers, an advantage of the 
method is that it requires a substantially reduced amount of 
primer (bivalent primer) as compared to conventional PGR, 
since following the initial cycle (s) an increasing percentage 
of the priming of nascent strand synthesis is primed from 3'-.^ 

20 hydroxy 1 groups of the amplified strands, rather than from the 
oligonucleotide primer ( s) . In the case of a rolling circle 
intermediate, theoretically only a single bivalent primer 
molecule is necessary to generate the rolling circle which 
then can produce multiple concatenated copies by rolling 

2 5 circle-style polymerase catalysis using a polymerase capable 
of strand displacement of the lagging edge of the nascent 
strand as replication proceeds around the cyclized template. 

In an embodiment, a product polynucleotide is 
assembled from a plurality of component polynucleotides by 

30 formation of overlapped strands of alternating polarity and 
having substantially complementary termini (see. Fig. 3). 
This method employs a series of overlapping substantially 
complementary termini to determine the linear order of 
component sequences in the final product. Concomitant with or 

35 subsequent to formation of the overlapped strands of the 
component polynucleotides in a reaction, a polynucleotide 
polymerase (e.g., a thermostable DNA polymerase) under 
suitable reaction conditions catalyzes strand elongation from 
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the 3'-hydroxyl portions of the overlapped (annealed) joints, 
filling in the portion between joints and processively 
displacing or processively degrading exonucleolytically the 5* 
termini of downstream component strands of the same polarity 
5 as the nascent strand elongates. After a cycle of chain 
elongation forming substantially double-stranded 
polynucleotides, the reaction conditions are altered 
(typically by increasing the temperature) to effect 
denaturation of the double-stranded polynucleotides, followed 
10 by altering the reaction conditions to permit reannealing of 
complementary strands or portions thereof (i.e., overlapping 
termini) to form molecules having overlapped termini (joints), 
and chain elongation by a polynucleotide polymerase under 
suitable reaction conditions catalyzes strand elongation from 
15 the 3*-hydroxyl portions of the overlapped (annealed) joints, 
as in the first cycle- One to about 100 cycles of 
denaturation/annealing/polymerization can be performed to 
generate a product comprising the component polynucleotide 
sequences covalently linked in linear order according to the 
20 order of the overlapping joints. In this embodiment, a 

product polynucleotide can be constructed from a plurality of 
smaller component polynucleotides (typically oligonucleotides) 
and enables assembly of a variety of products with alternate 
substitutable polynucleotide components at a given position 
25 serving as structural "alleles" (see, Fig. 4). The component 
polynucleotides are often provided in single-strand form, but 
may initially be present in double-strand form and be 
denatured (typically by elevated temperature) for the assembly 
of the product by PGR amplification. Substantially any type 
30 of product polynucleotide can be assembled in this way, 

including cloning and expression vectors, viral genomes, gene 
therapy vectors, genes (including chimeric genes), 
polynucleotides encoding peptide libraries, protein libraries, 
vector libraries, viral libraries, and the like. In a 
35 variation, one or more of the component polynucleotides 
represents a site-directed mutation or variable-sequence 
kernel. In a variation, PGR employing a low-fidelity 
polymerase is used to introduce additional sequence variation 
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into the product polynucleotide (s) during amplification 
cycles. The method can be used to produce a library of 
sequence-variant product polynucleotides, if desired* 
In an embodiment of the invention, very long 
5 distance PGR is provided, wherein PGR or other suitable 

amplification method is used to generate, in a single reaction 
or in parallel reactions which are subsequently pooled, a set 
of overlapping large DNA fragments which can be denatured and 
annealed to form very large (e.g., greater than 25 to 50 

10 kilobases) DNA structures composed of overlapped single 
strands of DNA having alternating polarity with each 
overlapped joint providing an extendable 3*-hydroxyl group for 
forming phosphodiester bonds catalyzed by a polynucleotide 
polymerase in the presence of free ribonucleotide or 

15 deoxyribonucleotides. Typically, the method comprises forming 
at least three overlapping polynucleotides, wherein the 3' 
terminus of a first single-stranded polynucleotide is 
substantially complementary to the 3' terminus of a second 
single-stranded polynucleotide of the opposite polarity, and 

20 wherein the 5' terminus of said second single-stranded 
polynucleotide is substantially complementary to the 3* 
terminus of a third single-stranded polynucleotide having 
polarity identical to said first single-stranded 
polynucleotide, thereby generating an overlapped structure 

25 capable of chain elongation by a suitable polymerase to 

generate a double-stranded product spanning the three initial 
overlapped polynucleotides- With such a method, 
polynucleotides of 50 kb to 100 kb or more can be generated by 
. a facile amplification method capable of -generating ^ 

30 amplification products much longer than is possible with 

conventional long-range PGR methods. The method can comprise 
parallel processing PGR reactions, wherein a plurality of 
primer sets are employed in a single reaction or multiple 
reactions which are subsequently pooled, each primer sets 

35 priming the PGR amplification of a polynucleotide sequence 

which comprises terminal sequences which are complementary to 
terminal sequences in at least on other amplification product 
produced by a different primer set, thus generating a set of 
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overlapping PGR products with which a large product spanning 
the entire set of PGR products is generated by end- 
complementary polymerase reaction. 

In some embodiments of the invention, the 
5 polynucleotides product (s) generated thereby are labelled, 
such as with radioisotopic, biotinyl, or fluorescent label 
moieties, by incorporation of labelled ribonucleotide or 
deoxyribonucleotides or the like into nascent polynucleotide 
by polymerase-mediated catalysis. 
0 The invention also provides kits comprising a 

bivalent primer polynucleotide and/or a plurality of component 
polynucleotides and instructions for use describing the 
present end-complementary amplification method disclosed 
herein. Frequently, a polynucleotide polymerase, such as a 
5 thermostable DNA polymerase (Tag or Vent™ polymerase) is also 
present in the kit. Optionally, one or more target 
polynucleotides may be provided in the kit, such as for 
calibration and/ or for use as a positive control to verify 
correct performance of the kit. 
0 In an embodiment, the invention provides a method 

termed continuous multiplex amplification which affords 
amplification of a plurality of initially unlinked 
polynucleotide species at substantially comparable 
amplification rates by forming a linked amplification product 
>5 wherein the plurality of initially unlinked polynucleotide 
sequences are linked by end-complementary amplification. An 
amplification unit, termed an amplicon, comprising at least 
one copy of each member of the plurality of initially unlinked 
polynucleotide species is formed by one or more cycles of end- 
30 complementary amplification. From one to about 100, typically 
three to 35, amplification cycles can be conducted and result 
in formation of a population of linked amplification products, 
which can comprise concatemers of said amplicon. The 
amplification products can be linear or circular, as desired, 
35 based on appropriate selection of the bivalent primers. In a 
variation, the amplification product is cleaved with a 
nucleolytic agent, such as a restriction enzyme which cuts at 
least one restriction site present in the amplicon, DNase, 
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nuclease SI, bleomycin, ionizing radiation, or the like or by 
other suitable cleaving means. 

A further understanding of the nature and advantages 
of the invention will become apparent by reference to the 
5 remaining portions of the specification and drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1. Schematic of bivalent primer and 

concatemer formation in end-complementary PGR. 
10 Fig. 2 Schematic depiction of cyclized 

intermediates and rolling circle amplification. 

Fig. 3 Schematic of overlapping fragment PGR for 

construction and amplification of larger products from 

component polynucleotides. 
15 Fig. 4 Schematic of multiple "alleles" with 

overlapping fragment PGR. 

Fig. 5A-E shows oligonucleotides used to construct 

the 2.7 kb circular plasmid pl82SfiI by end-complementary 

polymerase reaction. 
20 Fig. 6 Schematic of plasmid construction by 

overlapping fragment PGR as performed in Experimental 

Examples. 

Fig. 7 Agarose gel electrophoresis of products 
generated during construction of the 2.7 kb circular plasmid 

25 pl82SfiI by end-complementary polymerase reaction. The 
circled letters refer to aliguots removed from various 
amplification reactions: (A) is the mixture of 
oligonucleotides without polymerase, (B) is the product of the 
first set of amplification cycles, (G) is the product of the 

30 second set of amplification cycles, (D) is the final product. 

Fig. 8 Schematic for end-complementary polymerase 
reaction (EGPR) in conjunction with parallel-processing PGR to 
amplify very large polynucleotides, such as those larger than 
can be amplified reliably by conventional PGR using only a 

35 single primer set. 

Figs. 9A-9G Schematic of continuous circular 
multiplex amplification methodology exemplifying amplification 
of two unlinked polynucleotides, double-stranded ABG/A'B'G' 
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and double-stranded DEF/D'E'F'. A and A- , B and B' , C and C , 
D and D', E and E', F and F' each represent a set of 
complementary (or substantially complementary) polynucleotide 
sequences present in the initially unlinked polynucleotides. 
5 X and X', y and Y* each represent a set of complementary (or 
substantially complementary) polynucleotide sequences present 
in the bivalent primers CYD, C'X'D', FYA, and F'X'A'. In some 
embodiments, X and X' and/ or Y and Y' can be omitted. Fig. 9A 
shows the initial (starting) conditions with a plurality of 
10 polynucleotide species (shown for example as two double- 
stranded molecules) to be amplified by continuous circular 
multiplex amplification. The second step, "Anneal with 
Primers", shows the structural features of the bivalent 
primers and their mode of annealing to single-stranded 
15 polynucleotide species, shown as denatured double-stranded 
complementary polynucleotides. The third step, "Extend and 
Melt", shows the amplification products after a cycle of 
extension primed by the bivalent primers; the amplification 
products are denatured for a subsequent round (s) of 
20 amplification. Fig. 9B shows the possible modes of 

reannealing of the amplification products generated from the 
first round of amplification. Fig. 9C shows the amplification 
products which can result from the second round (or subsequent 
rounds) of amplification; each of the product molecules is 
25 capable of self -replication as they have complementary ends, 
and they can also cross-replicate. Each of the product 
molecules comprises copies of the initially unlinked 
polynucleotide sequences in equimolar ratios. 

Figs. lOA-lOC Schematic of continuous linear 
30 multiplex amplification methodology exemplifying amplification 
of two unlinked polynucleotides, double-stranded ABC/A- B'C 
and double-stranded DEF/D'E'F'. A and A', B and B' , C and C, 
D and D', E and E', F and F' each represent a set of 
complementary (or substantially complementary) polynucleotide 
35 sequences present in the initially unlinked polynucleotides. 
X and X', Y and Y' each represent a set of complementary (or 
substantially complementary) polynucleotide sequences. X' and 
Y are present in the bivalent primers CYD and C'X'D'. The 
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univalent primers are F' and A. In some embodiments, X and X' 
and/or Y and Y' can be omitted. Fig. lOA shows the initial 
(starting) conditions with a plurality of polynucleotide 
species (shown for example as two double-stranded molecules) 
5 to be amplified by continuous circular multiplex 

amplification. The second step, "Anneal with Primers", shows 
the structural features of the bivalent and univalent primers 
and their mode of annealing to single-stranded polynucleotide 
species, shown as denatured double-stranded complementary 

10 polynucleotides. The third step, "Extend and Melt", shows the 
amplification products after a cycle of extension primed by 
the bivalent and univalent primers; the amplification products 
are denatured for a subsequent round (s) of amplification - 
Fig. lOB shows the possible modes of reannealing of the 

15 amplification products generated from the first round of 
amplification. Fig. IOC shows the amplification products 
which can result from the second round (or subsequent rounds) 
of amplification; each of the product molecules comprises 
copies of the initially unlinked polynucleotide sequences in 

20 equimolar ratios. 

Figs. IIA-IIC Schematic of continuous circular 
multiplex amplification methodology exemplifying amplification 
of two possibly unlinked polynucleotides embedded in distinct 
locations in a genome or pool of DNA molecules. Fig. IIA 

25 shows the initial (starting) conditions with a plurality of 
polynucleotide sequences (shown for example as two double- 
stranded sequences embedded in discrete genomic locations) to 
be amplified by continuous circular multiplex amplification. 
First, the genomic sequences are amplified using a low 

3 0 concentration of conventional amplification primers (shown as 
PCR primers; C»,F',A, and D) as indicated under "Anneal #1". 
The concentration of conventional primers, initial copy 
number, and number of amplification cycles is such that 
primers for rapidly extending fragments are consumed and 

35 slowly extending sequences are allowed to amplify. Fig. IIB 
shows that bivalent primers (FT3'XT7A and CYD) are used in 
subsequent rounds of amplification. In this example, one of 
the bivalent primers (FT3'XT7A) comprises the sequences for 
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one or more promoter sequence, in this case a T3 promoter and 
a T7 promoter oriented in opposite transcriptional polar it i s. 
The mode of hybridization of the bivalent primers to denatured 
amplification product is shown under "Anneal with Primers", 
5 and the resultant amplification products are shown under 
"Extend and Melt". Fig. IIC shows possible modes of 
reannealing of the denatured products of amplification using 
the bivalent primers under "Reannealing". Examples of the 
resultant products of self-primed amplification are shown 

10 under "Extend". Each of the product polynucleotides shown 
have complementary ends and are capable of self-replication 
and cross-replication. The sequences X and Y, if present, can 
comprise restriction sites, if desired. 

Figure 12 Arsenate, arsenite and antimony 

15 resistance for E. coli strain TGI, without a plasmid, with 
plasmid pGJ103 with the wild type ars operon, or with pGJ103 
mutagenized by three cycles of PGR shuffling. Cells grown 
overnight in LB with 2 mM, 10 mM or 128 mM arsenate were 
diluted 10,000-times into LB with added oxyanions as indicated 

20 and turbidity was measured after 16 hours growth at 37*»C. 
Equal amounts of cells (OD600) were plated on plates with a 
range of arsenate concentrations and grown overnight at 37<»C. 
Cell growth was quant itated by resuspending the cells and 
measuring the OD600. 

25 Figure 13 Cells as in Figure 12 were washed and 

suspended in triethanol amine buffer and exposed to 3 mM ''^As- 
arsenate. Samples were removed periodically, heated to lOO^'C, 
and centrifuged. ''^As-arsenate and ''^As-arsenite were 
quant itated after thin layer chromatographic separation. 

30 

pESCRIPTION OF THE PRE FERRED EMBODIMENTS 

Unless defined otherwise, all technical and 
scientific terms used herein have the same meaning as commonly 
understood by one of ordinary skill in the art to which this 
35 invention belongs. Although any methods and materials similar 
or equivalent to those described herein can be used in the 
practice or. testing of the present invention, the preferred 



wo 96/33207 PCT/US96/05480 

15 

methods and materials are described. For purposes of the 
present inv ntion, the following terms are defined below. 

Definitions 

5 Unless specified otherwise, the conventional 

notation used herein portrays polynucleotides as follows: the 
lefthand end of single-stranded polynucleotide sequences is 
the 5' end; the lefthand direction of double-stranded 
polynucleotide sequences is referred to as the 5' direction. 

10 The direction of 5 ' to 3 ' addition of nascent RNA transcripts 
is referred to as the transcription direction; sequence 
regions on the DNA strand having the same sequence as the RNA 
and which are 5 ' to the 5 • end of the RNA transcript are 
referred to as "upstream sequences"; sequence regions on the 

15 DNA strand having the same sequence as the RNA and which are 
3' to the 3' end of the coding RNA transcript are referred to 
as "downstream sequences" . 

As used herein, the term "polynucleotide" refers to 
a polymer composed of a multiplicity of nucleotide units 

20 (ribonucleotide or deoxyribonucleotide or related structural 
variants) linked via phosphodiester bonds, A polynucleotide 
can be of substantially any length, typically from about 10 
nucleotides to about 1x10^ nucleotides or larger. As used 
herein, an "oligonucleotide" is defined as a polynucleotide of 

25 from 6 to 100 nucleotides in length. Thus, an oligonucleotide 
is a subset of polynucleotides. 

The term "naturally-occurring" as used herein as 
applied to an object refiers to the fact that an object can be 
- found in nature. For example, a polypeptide or polynucleotide 

3 0 sequence that is present in an organism (including viruses) 

that can be isolated from a source in nature and which has not 
been intentionally modified by man in the laboratory is 
naturally-occurring. Generally, the term naturally-occurring 
refers to an object as present in a non-pathological 

35 (undiseased) individual, such as would be typical for the 
species. 

The term "corresponds to" is used herein to mean 
that a polynucleotide sequence is homologous (i.e., is 
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identical, not strictly evolutionarily related) to all or a 
portion of a reference polynucleotide sequence. In 
contradistinction, th term "complementary to" is used herein 
to mean that the complementary sequence is homologous to all 
5 or a portion of a reference polynucleotide sequence. For 

illustration, the nucleotide sequence "TATAC" corresponds to a 
reference sequence "TATAC" and is complementary to a reference 
sequence "GTATA" . 

The following terms are used to describe the 
10 sequence relationships between two or more polynucleotides: 
"reference sequence", "comparison window", "sequence 
identity", "percentage of sequence identity", and "substantial 
identity". A "reference sequence" is a defined sequence used 
as a basis for a sequence comparison; a reference sequence may 
15 be a subset of a larger sequence, for example, as a segment of 
a full-length cDNA or gene sequence given in a sequence 
listing, or may comprise a complete cDNA or gene sequence. 
Generally, a reference sequence is at least 12 nucleotides in 
length, frequently at least 15 to 18 nucleotides in length, 
20 and often at least 25 nucleotides in length. Since two 
polynucleotides may each (1) comprise a sequence (i.e., a 
portion of the complete polynucleotide sequence) that is 
similar between the two polynucleotides, and (2) may further 
comprise a sequence that is divergent between the two 
25 polynucleotides, sequence comparisons between two (or more) 

polynucleotides are typically performed by comparing sequences 
of the two polynucleotides over a "comparison window" to 
identify and compare local regions of sequence similarity. 

A "comparison window", as used herein, refers to a 
3 0 conceptual segment of at least 12 contiguous nucleotide 

positions wherein a polynucleotide sequence may be compared to 
a reference sequence of at least 12 contiguous nucleotides and 
wherein the portion of the polynucleotide sequence in the 
comparison window may comprise additions or deletions (i.e.. 
3 5 gaps) of 20 percent or less as compared to the reference 

sequence (which does not comprise additions or deletions) for 
optimal alignment of the two sequences. Optimal alignment of 
sequences for aligning a comparison window may be conducted by 
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the local homology algorithm of Smith and Waterman (1981) Adv, 
ApdI. Math. 2: 482, by the homology alignment algorithm of 
Needleman and Wunsch (1970) J, Mol. Biol. 48: 443, by the 
search for similarity method of Pearson and Lipman (1988) 
5 Proc. Natl. Acad. Sci, fU.S,A.) 85 ; 2444, by computerized 

implementations of these algorithms (GAP, BESTFIT, FASTA, and 
TFASTA in the Wisconsin Genetics Software Package Release 7.0, 
Genetics Computer Group, 575 Science Dr., Madison, WI) , or by 
inspection, and the best alignment (i.e., resulting in the 

10 highest percentage of homology over the comparison window) 
generated by the various methods is selected. 

The term "sequence identity" means that two 
polynucleotide sequences are identical (i.e., on a nucleotide- 
by-nucleotide basis) over the window of comparison. The term 

15 "percentage of sequence identity" is calculated by comparing 
two optimally aligned sequences over the window of comparison, 
determining the number of positions at which the identical 
nucleic acid base (e.g., A, T, C, G, U, or I) occurs in both 
sequences to yield the number of matched positions, dividing 

20 the number of matched positions by the total number of 

positions in the window of comparison (i.e., the window size), 
and multiplying the result by 100 to yield the percentage of 
sequence identity. The terms "substantial identity" as used 
herein denotes a characteristic of a polynucleotide sequence, 

25 wherein the polynucleotide comprises a sequence that has at 
least 80 percent sequence identity, preferably at least 85 
percent identity and often 90 to 95 percent sequence identity, 
more usually at least 99 percent sequence identity as compared 
to a reference sequence over a comparison window of at least 

30 20 nucleotide positions, frequently over a window of at least 
25-50 nucleotides, wherein the percentage of sequence identity 
is calculated by comparing the reference sequence to the 
polynucleotide sequence which may include deletions or 
additions which total 20 percent or less of the reference 

3 5 sequence over the window of comparison. The reference 
sequence may be a subset of a larger sequence. 

The primers herein are selected to be substantially 
complementary to the different strands of each specific 
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sequence to be amplified. The primers must be sufficiently 
complem ntary to hybridiz with their respective strands. 
Therefore, the primer sequence need not reflect the exact 
sequence of the template. For example, a non-complementary 
5 nucleotide fragment may be attached to the 5' end of the 
primer, with the remainder of the primer sequence being 
complementary to the strand. Alternatively, noncomplementary 
bases or longer sequences can be interspersed into the primer, 
provided that the primer sequence has sufficient 
10 complementarity with the sequence of the strand to be 

amplified to hybridize therewith and thereby form a template 
for synthesis of the extension product of the other primer. 

As used herein, a "bivalent primer" is a 
polynucleotide having two regions of complementarity to a 
15 predetermined target polynucleotide: (1) a first portion which 
is in the 5' portion of the bivalent primer and which is 
substantially complementary to a sequence in the 3' portion of 
the sequence to be amplified (target sequence) in the target 
polynucleotide, and (2) a second portion which is in the 3' 
20 portion of the primer and which is substantially complementary 
to a sequence in the 5' portion of the sequence to be 
amplified (target sequence) in the target polynucleotide. The 
portion of the bivalent primer which is substantially 
complementary to a sequence in the 3' portion of the sequence 
25 to be amplified (target sequence) is sufficiently long and 
sufficiently complementary to the target sequence to anneal 
under the reaction conditions and serve as an extendable 
primer for the polymerase to catalyze chain elongation. 
Similarly, the portion of the bivalent primer which is 
30 substantially complementary to a sequence in the 5* portion of 
the sequence to be amplified (target sequence) is sufficiently 
long and sufficiently complementary to the target sequence to 
anneal under the reaction conditions and serve as an 
extendable primer for the polymerase to catalyze chain 
35 elongation. Practitioners in the art will select at their 

discretion the specific structure of the bivalent primer(s) to 
be used in view of the necessity for annealing to the target. 
Typically, the portions of the bivalent primer which is 
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substantially complementary to a sequence in the 5' and 3* 
portions of the secjuence to be amplified (target sequence) are 
each at least 12 to 15 nucleotid s in length, often 18 to 20 
nucleotides in length, and are preferably 100 percent 
5 identical to the complement of the annealing portion of the 
target sequence* Often, bivalent primers of the invention are 
oligonucleotides . 

The term "primer" as used herein refers to an 
oligonucleotide whether occurring naturally as in a purified 

10 restriction digest or produced synthetically, which is capable 
of acting as a point of initiation of synthesis when placed 
under conditions in which synthesis of a primer extension 
product which is complementary to a nucleic acid strand is 
induced, i.e,, in the presence of nucleotides and an agent for 

15 polymerization such as DNA polymerase and at a suitable 

temperature and pH. The primer is preferably single-stranded 
for maximum efficiency in amplification, but may alternatively 
be double stranded. If double stranded, the primer is first 
treated to separate its strands before being used to prepare 

20 extension products. Preferably, the primer is an 

oligodeoxyribonucleotide. The primer must be sufficiently long 
to prime the synthesis of extension products in the presence 
of the agent for polymerization. The exact lengths of the 
primers will depend on many factors, including temperature and 

25 source of primers. For example, depending on the complexity 
of the target sequence, the oligonucleotide primer typically 
contains 15-25 or more nucleotides, although it may contain 
fewer nucleotides. Short primer molecules generally require 
cooler temperatures to form sufficiently stable hybrid 

3 0 complexes with template. In some embodiments, the primers can 
be large polynucleotides, such as from about 200 nucleotides 
to several kilobases or more. 

As used herein, "suitable reaction conditions" are 
those conditions suitable for conducting PGR amplification 

35 using conventional reagents. Such conditions are known or 
readily established by thos of skill in the art, and can be 
exemplified by the r action conditions used in U.S. Patents 
4,683,202, 4,683,195, and 4,800,159, Which are incorporated 
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herein by reference. As one example and not to limit the 
invention, suitabl reaction conditions can comprise: 0.2mM 
each dNTP, 2.2 nM MgClj, 50 mM KCl, 10 mM Tris-HCl pH 9.0, 
0.1% Triton X-100. 
5 As used herein the term "physiological conditions" 

refers to temperature, pH, ionic strength, viscosity, and like 
biochemical parameters which are compatible with a viable 
organism, and/or which typically exist intracellularly in a 
viable cultured yeast cell or mammalian cell. For example, 
10 the intracellular conditions in a yeast cell grown under 
typical laboratory culture conditions are physiological 
conditions. Suitable in vitro reaction conditions for PGR and 
many polynucleotide enzymatic reactions and manipulations are 
generally physiological conditions. In general, in vi-^rp 
15 physiological conditions comprise 50-200 mM NaCl or KCl, pH 
6.5-8.5, 20-45'>C and 0.001-10 mM divalent cation (e.g., Mg* , 
Ca**) ; preferably about 150 mM NaCl or KCl, pH 7.2-7.6, 5 mM 
divalent cation, and often include 0.01-1.0 percent 
nonspecific protein (e.g., BSA) . A non-ionic detergent 
20 (Tween, NP-40, Triton X-100) can often be present, usually at 
about 0.001 to 2%, typically 0.05-0.2% (v/v) . Particular 
aqueous conditions may be selected by the practitioner 
according to conventional methods. For general guidance, the 
following buffered aqueous conditions may be applicable: 10- 
25 250 mM NaCl, 5-50 mM Tris HCl, pH 5-8, with optional addition 
of divalent cation(s) and/or metal chelators and/or nonionic 
detergents and/or membrane fractions and/or antifoam agents 
and/or scintillants. 

AS used herein, the terms "label" or "labeled" 
30 refers to incorporation of a detectable marker, e^a^, by 

incorporation of a radiolabeled nucleotide or incorporation of 
nucleotide having biotinyl moieties that can be detected by 
marked avidin (e.g., streptavidin containing a fluorescent 
marker or enzymatic activity that can be detected by optical 
35 or calorimetric methods) . Various methods of labeling 
polynucleotides are known in the art and may be used. 
Examples of labels include, but are not limited to, the 
following: radioisotopes (e.g., H, i>, x, j-; . 
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fluorescent labels (e.g., FITC, rhodamine, lanthanide 
phosphors), enzymatic labels (e.g., horseradish peroxidase, jS- 
galactosidase, lucif erase, alkaline phosphatase) , biotinyl 
groups, and the like. In some embodiments, labels are 
5 attached by spacer arms of various lengths to reduce potential 
steric hindrance. 

As used herein, "substantially pure" means an object 
species is the predominant species present (i.e., on a molar 
basis it is more abundant than any other individual 
10 macromolecular species in the composition) , and preferably a 
substantially purified fraction is a composition wherein the 
object species comprises at least about 50 percent (on a molar 
basis) of all macromolecular species present. Generally, a 
substantially pure composition will comprise more than about 
15 80 to 90 percent of all macromolecular species present in the 
composition. Most preferably, the object species is purified 
to essential homogeneity (contaminant species cannot be 
detected in the composition by conventional detection methods) 
wherein the composition consists essentially of a single 
20 macromolecular species. Solvent species, small molecules 

(<500 Daltons) , and elemental ion species are not considered 
macromolecular species. 

The term "recombinant" used herein refers to 
macromolecules produced by recombinant DNA techniques wherein 
25 the gene coding for a polypeptide is cloned by known 

recombinant DNA technology. For example, an amplified or 
assembled product polynucleotide may be inserted into a 
suitable DNA vector, such as a bacterial plasmid, and the 
plasmid used to transform a suitable host. The gene is then 
30 expressed in the host to produce the recombinant protein. The 
transformed host may be prokaryotic or eukaryotic, including 
mammalian, yeast, Aspergillus and insect cells. One preferred 
embodiment employs bacterial cells as the host. 
Alternatively, the product polynucleotide may serve a non- 
35 coding function (e.g., promoter, origin of replication, 
ribosome-binding site, etc.). 

Generally, the nomenclature used hereafter and many 
of the laboratory procedures in cell culture, molecular 
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genetics, and nucleic acid chemistry and hybridization 
described below are those well, known and commonly employed in 
the art. Standard techniques are used for recombinant nucleic 
acid methods, polynucleotide synthesis, in v^tyo polypeptide 
synthesis, and the like and microbial culture and 
transformation (e.g., electroporation) . Generally enzymatic 
reactions and purification steps are performed according to 
the manufacturer's specifications. The techniques and 
procedures are generally performed according to conventional 
methods in the art and various general references (see, 
generally . Sarabrook et al. Molecular Cloning: A Laboratory 
Manual, 2d ed. (1989) Cold Spring Harbor Laboratory Press, 
cold Spring Harbor, N.Y.; each of which is incorporated herein 
by reference) which are provided throughout this document. 
The procedures therein are believed to be well known in the 
art and are provided for the convenience of the reader. All 
the information contained therein is incorporated herein by 
reference. 

Oligonucleotides can be synthesized on an Applied 
0 Bio Systems oligonucleotide synthesizer according to 
specifications provided by the manufacturer. 

Methods for PCR amplification are described in the 
art / PCR Techpnloav: Pr inciples and Applications t<?T PWA 
Amplification ed. HA Erlich, Stockton Press, New York, NY 
5 (1989); PCP Protpnols: A Huide to Methods and Applications , 
eds. innis, Gelfland, Snisky, and White, Academic Press, San 
Diego, CA (1990); Mattila et al. (1991) N^^^leig ftcjds Res. 19: 
4967; Eckert, K.A. and Kunkel, T.A. (1991) ?CM Methods and 
Applications i: 17; and U.S. Patent Nos. 4,683,202 and 
JO 4,965,188, each of which are incorporated herein by reference) 
and exemplified hereinbelow. 

Overview 

A basis of the present invention is the use of 
35 polymerase in combination with at least two polynucleotides 
having complementary ends which can anneal whereby at least 
one of said polynucleotides has a free 3'-hydroxyl capable of 
polynucleotide chain elongation by a DNA polymerase, such as a 
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thermostable polymerase (e.g., Thermus aauaticus (Taq) 
polymerase or Thermococcus litoralis (Vent™) polymerase. In 
an embodiment, the method is performed using PGR, typically 
with multiple cycles of heat denaturation and DNA synthesis. 
5 However, there are several variations of the basic method of 
end-complementary polymerase reaction which are exemplified 
hereinbelow and which shall be evident to the skilled artisan 
in view of the present specification. Some variations do not 
require primers and/or sequential cycles of thermal 

10 denaturation. 

In embodiments where the product size increases with 
the number of denaturation, annealing, and extension cycles 
(e.g., as the mean length of concatemers increase), it is 
typically advantageous to increase the denaturation 

15 temperature, and optionally increase the reannealing time, for 
subsequent cycles. Such conditions are readily optimized by 
the practitioner using pilot reactions to establish a 
calibration curve for any particular embodiment. 

20 Sinale-Primer Amplification 

A target polynucleotide is contacted with a bivalent 
primer typically comprising an oligonucleotide having two 
regions of complementarity to the target polynucleotide: (1) a 
first portion which is in the 5' portion of the primer and 

25 which is substantially complementary to a sequence in the 3' 
portion of the sequence to be amplified (target sequence) in 
the target polynucleotide, and (2) a second portion which is 
in the 3 ' portion of the primer and which is substantially 
complementary to a sequence in the 5' portion of the sequence 

30 to be amplified (target sequence) in the target 

polynucleotide. The contacting is performed under conditions 
suitable for hybridization of the bivalent primer to the 
target polynucleotide for polymerase-mediated chain 
elongation, most often following thermal denaturation of the 

35 target polynucleotide if it is initially present as a double- 
stranded form. 

The first portion of the bivalent primer which is in 
the 5' portion of the primer and which is substantially 
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complementary to a sequence in the 3- portion of the sequence 
to be amplified (target sequence) in the target polynucleotide 
is typically at least 12 nucleotides in length, often at least 
15 nucleotides in length, frequently at least 18 nucleotides 
in length, and is commonly 20 to 25 or more nucleotides in 
length, but usually does not exceed 10,000 nucleotides in 
length and is frequently less than 50 to 500 nucleotides in 
length. The first portion of the bivalent primer is 
substantially identical to the complement of a sequence at the 
3' end of the target sequence, however there may be additional 
terminal nucleotides of the first portion of the bivalent 
primer which are substantially non-identical to a target 
sequence or its complement. Such terminal nucleotides must be 
substantially non-interfering so that their presence does not 
15 significantly inhibit the capability of the bivalent primer to 
selectively anneal to the target sequence and initiate chain 
elongation under suitable reaction conditions in the presence 
of polymerase. Although the first portion of the bivalent 
primer is substantially identical to the complement of a 
20 sequence at the 3* end of the target sequence, it need not be 
exactly identical; often a sequence identity of at least 80 
percent is sufficient, typically at least 90 percent sequence 
identity is present, and preferably at least 95 percent or 100 
percent sequence identity is present. As the length of the 
25 complementary sequence increases, typically the percentage of 
sequence identity necessary for specific annealing decreases 
within certain limits (pp. 399-407, in Berger and Kimmel, 
poi- ^nr^s in Enzvmoio»v. Volu m *^ 152- Guide to MolecuUr CJ.oninq 
Techniques (1987), Academic Press, Inc., San Diego, CA, which 
30 is incorporated herein by reference) . 

The second portion of the bivalent primer which is 
in the 3' portion of the primer and which is substantially 
complementary to a sequence in the 5' portion of the sequence 
to be amplified (target sequence) in the target polynucleotide 
35 is typically at least 12 nucleotides in length, often at least 
15 nucleotides in length, frequently at least 18 nucleotides 
in length, and is commonly 20 to 25 or more nucleotides in 
length, but usually does not exceed 10,000 nucleotides in 
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1 ngth and is frequently less than 50 to 500 nucleotides in 
length. The first portion of the bivalent primer is 
substantially identical to the complement of a s quence at the 
5' end of the target sequence, however there may be additional 
5 terminal nucleotides of the first portion of the bivalent 
primer which are substantially non-identical to a target 
sequence or its complement. Such terminal nucleotides must be 
substantially non-interfering so that their presence does not 
significantly inhibit the capability of the bivalent primer to 

10 selectively anneal to the target sequence and initiate chain 
elongation under suitable reaction conditions in the presence 
of polymerase. Although the first portion of the bivalent 
primer is substantially identical to the complement of a 
sequence at the 5' end of the target sequence, it need not be 

15 exactly identical; often a sequence identity of at least 80 
percent is sufficient, typically at least 90 percent sequence 
identity is present, and preferably at least 95 percent or 100 
percent sequence identity is present. In some embodiments, 
sequence identity of less than 80 percent is practicable, but 

20 the amount of sequence identity and length of overlap for the 
joints is determined by the discretion of the practitioner. 

The amount of sequence identity necessary for any 
given application will vary depending on several factors 
including: (1) complexity of the population of polynucleotides 

25 in which the target polynucleotide (s) is/are present, (2) 
temperature and ionic strength, (3) sequence composition of 
the target sequence, (4) length of sequence identity, and (5) 
size of the primer. Practitioners will select bivalent 
primers having a first portion with sufficient sequence 

30 identity and length to serve as selective amplification 

primers which specifically hybridize to the desired target 
polynucleotide (s) . Specific hybridization is the formation of 
hybrids between a primer polynucleotide and a target 
polynucleotide, wherein the primer polynucleotide 

3 5 preferentially hybridizes to the target DNA such that, for 

example, at least one discrete band can be identified on a gel 
of amplification products obtained from amplification of 
genomic DNA prepared from eukaryotic cells that contain (or 
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are spiked with) the target polynucl otide s quence. In some 
instances, a target sequence may be present in more than one 
target polynucleotide species (e.g., a particular target 
sequence may occur in multiple members of a gene family or in 
5 a known repetitive sequence) . It is evident that optimal 

hybridization conditions will vary depending upon the sequence 
composition and length (s) of the targeting polynucleotide (s) 
and target (s), and the experimental method selected by the 
practitioner. Various guidelines may be used to select 
10 appropriate primer secpiences and hybridization conditions 

( see . Maniatis et al. , Molecular Cloning : A Laboratory Manual 

(1989) , 2nd Ed., Cold Spring Harbor, N.Y.; Berger and Kimmel, 
Methods in Enzvmoloav, Volume 152. Guide to Molecular Cloning 
Techniques (1987), Academic Press, Inc., San Diego, CA; PCR 

15 Protocols: A Guide to Methods and Applications, eds, Innis, 
Gelfland, Snisky, and White, Academic Press, San Diego, CA 

(1990) ; Benton WD and Davis RW (1977) Science 196: 180; 
Goodspeed et al. (1989) Gene 76: 1; Dunn et al. (1989) i. 
Biol. Chem. 264 : 13057 which are incorporated herein by 

20 reference. 

The target polynucleotide may be substantially 
homogeneous or may be present in a mixture of polynucleotide 
species (e.g., in a genome, biological sample, or mixture of 
synthetic polynucleotides) . Subsequent or concomitant with 
25 the contacting of the bivalent primer to the target 

polynucleotide, a polynucleotide polymerase, such as a 
thermostable DNA polymerase, e.g., Taq polymerase, TThI 
polymerase (Perkin Elmer) or Vent™ (New England Biolabs, 
Beverly, MA), catalyzes, under suitable reaction conditions, 

30 polynucleotide synthesis (chain elongation) primed from the 
3'-hydroxyl of the annealed bivalent primer to form a strand 
complementary to the target sequence, thereby forming a 
nascent complementary strand. Following completion of the 
nascent complementary strand spanning the target sequence, the 

35 target polynucleotide and the nascent strand are denatured, 
typically by elevation of temperature, and allowed to 
reanneal, typically by reduction of temperature, with another 
molecule of the bivalent primer species or with a 
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complementary strand of a target polynucleotide or an 
amplifi d copy thereof- The denatured nascent strand species 
following the first elongation cycle will contain a copy of 
the target sequence and has a terminal repeat of its 5*- 
5 terminal sequence at the 3* terminus, resulting from the 
bivalent primer, and wherein the terminal repeat is of 
sufficient length to support annealing under PGR conditions to 
an overlapping complementary strand in a head-to-tail 
arrangement ( see . Fig. 1). Following reannealing, the 

10 described polymerase elongation/denaturation/reannealing cycle 
is repeated from 1 to about 100 times as desired, resulting in 
formation of amplified product which comprises head-to-tail 
concatemers of the target sequence. The concatemers typically 
increase in length as the number of amplification cycles 

15 increase and as the amount of bivalent primer decreases. 
Following amplification forming concatameric head-to-tail 
repeats of the target sequence, the concatemer (s) can 
optionally be resolved, such as (1) by cleaving with a 
restriction endonuclease which cuts within (or at the termini 

20 of) the concatemeric unit(s), (2) by homologous recombination 
between concatemer units to form covalently closed circles, or 
(3) by cleavage with a restriction endonuclease followed by 
ligation with DNA ligase to form covalently closed circles 
and/or by direct transformation into host cells for in vivo 

25 ligation. 

By this method, a single primer (bivalent primer) is 
used to amplify a target polynucleotide sequence having a 
predetermined 5 • terminal sequence and a predetermined 3 ' 
terminal sequence* The predetermined 5' terminal sequence and 
30 a predetermined 3' terminal sequence may be contained 

internally within a larger polynucleotide; hence the use of 
the term "tejrminal" refers only to their terminality within 
the target sequence, not necessarily the complete target 
polynucleotide which may be a superset of the target sequence, 

35 

Rolling Circl e PGR Amplification 

Often, a target polynucleotide sequence which is 
amplified by the present method will form amplification 
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intermediates in the form of cyclized DNA ( see . Fig. 2) , as a 
result of the 3 ' terminus of an overlapped nascent strand 
annealing to the 3' terminus of an overlapped complementary 
strand forming a cyclized (circular) structure similar to a 
5 gapped circle. The cyclized structure has a strand with an 
extendable 3«-hydroxyl which can be extended with a DNA 
polymerase substantially lacking exonuclease activity (e.g., a 
thermostable polymerase such as Vent(exo")™ or Klenow 
fragment, etc.) in a rolling circle format whereby the leading 

10 terminus of the nascent strand continually displaces the 

lagging portion of the nascent strand (see, Fig. 2) producing 
a concatemeric single strand propagating from the rolling 
circle intermediate. Most often, such rolling circle 
intermediates will form under dilute conditions more favorable 

15 to intramolecular cyclization of overlapped strands than to 
formation of additional intermolecular overlaps. Once a 
rolling circle intermediate is established, the template need 
not be denatured in order to continue amplification of the 
target sequence as in conventional PGR, since the polymerase 

20 continues around the circle processively . Thus, the 

advantageous formation of the rolling circle intermediate in 
the present method avoids the necessity of multiple thermal 
cycles of PGR to repeatedly denature and renature the 
amplification template (and the resultant time loss needed for 

25 heating and cooling) . 

Overlapped Assembly of Poly nucleotides 

The present invention also provides for assembly of 
one or more product polynucleotide (s) from a plurality of 

30 component polynucleotides which have overlapping complementary 
sequence portions at their termini. The component 
polynucleotides are conveniently single-stranded 
oligonucleotides, but can include double-stranded 
polynucleotides (which are generally denatured with elevated 

3 5 temperature) and long single-stranded polynucleotides, 

A desired product polynucleotide (or polynucleotide 
library) is assembled from a plurality of component 
polynucleotides by formation of overlapped strands of 
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alternating polarity and having substantially complementary 
termini ( see . Fig. 3) . This m, thod employs a series of 
overlapping substantially complementary t rmini to determine 
the linear order of component sequences in the final product. 
5 Concomitant with or subsequent to formation of the overlapped 
strands of the component polynucleotides in a reaction, a 
polynucleotide polymerase (e.g., a thermostable DNA 
polymerase) under suitable reaction conditions catalyzes 
strand elongation from the 3'-hydroxyl portions of the 

10 overlapped (annealed) joints, filling in the portion between 
joints and processively displacing or processively degrading 
exonucleolytically the 5' termini of downstream component 
strands of the same polarity as the nascent strand elongates. 
After a cycle of chain elongation forming substantially 

15 double-stranded polynucleotides, the reaction conditions are 
altered (typically by increasing the temperature) to effect 
denaturation of the double-stranded polynucleotides, followed 
by altering the reaction conditions to permit reannealing of 
complementary strands or portions thereof (i.e., overlapping 

20 termini) to form molecules having overlapped termini (joints) , 
and chain elongation by a polynucleotide polymerase under 
suitable reaction conditions catalyzes strand elongation from 
the 3»-hydroxyl portions of the overlapped (annealed) joints, 
as in the first cycle. One to about 100 cycles of 

25 denaturation/annealing/polymerization can be performed to 
generate a product comprising the component polynucleotide 
sequences covalently linked in linear order according to the 
order of the overlapping joints. In this embodiment, a 
product polynucleotide can be constructed from a, plurality of 

30 smaller component polynucleotides (typically oligonucleotides) 
and enables assembly of a variety of products with alternate 
substitutable polynucleotide components at a given position 
serving as structural "alleles" ( see . Fig. 4). The component 
polynucleotides are often provided in single-strand fora, but 

35 may initially be present in double-strand form and be 

denatured (typically by elevated temperature) for the assembly 
of the product by PGR amplification. Substantially any type 
of product polynucleotide can be assembled in this way, 
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including cloning and expression vectors, viral genomes, gene 
therapy vectors, g n s (including chimeric genes), 
polynucleotides encoding peptide libraries, and the like. In 
a variation, one or more of the component polynucleotides 
5 represents a site-directed mutation or variable-sequence 
kernal. In a variation, PGR employing a low-fidelity 
polymerase is used to introduce additional sequence variation 
into the product polynucleotide(s) during amplification 
cycles. The method can be used to produce a library of 
10 sequence-variant product polynucleotides, if desired. 

Kits 

The invention also provides kits comprising a 
bivalent primer polynucleotide and/or a plurality of component 

15 polynucleotides and instructions for use describing the 
present end-complementary amplification method disclosed 
herein. Frequently, a polynucleotide polymerase, such as a 
thermostable DNA polymerase (Taq or Vent™ polymerase) is also 
present in the kit. Optionally, one or more target 

20 polynucleotides may be provided in the kit, such as for 

calibration and/or for use as a positive control to verify 
correct performance of the kit. 

general Aspects 
25 The target polynucleotides or component 

polynucleotides may be obtained from any source, for example, 
from plasmids such as pBR322, from cloned DNA or RNA, or from 
natural DNA or RNA from any source, including bacteria, yeast, 
viruses, and higher organisms such as plants or animals. DNA 

30 or RNA may be extracted from blood, tissue material such as 
chorionic villi or amniotic cells by a variety of techniques 
such as that described by Maniatis et al.. Molecular Cloning: 
A Laboratory Manual, (New York: Cold Spring Harbor Laboratory, 
1982), pp. 280-281. Alternatively, the polynucleotides may be 

35 produced by chemical synthesis by any of the art-recognized 
methods . 

Any specific nucleic acid sequence can be produced 
by the present process. It is only necessary that a sufficient 
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number of bases at both ends of the sequence be known in 
suffici nt detail so that a bivalent primers can be prepar d 
which will hybridize to the desired sequence and at r lative 
positions along the sequence such that an extension product 
5 initially synthesized from the bivalent primer, when it is 
separated from its template (complement) , can anneal with a 
stand of the opposite polarity to form an overlapped joint of 
a head-to-tail concatemer and serve as a template for 
extension of the 3'-hydroxyl from each overlapped joint. The 

10 greater the knowledge about the bases at both ends of the 

sequence, the greater can be the specificity of the primer for 
the target nucleic acid sequence, and thus the greater the 
efficiency of the process. It will be understood that the 
word bivalent primer as used hereinafter may refer to more 

15 than one bivalent primer, particularly in the case where there 
is some ambiguity in the information regarding the terminal 
sequence (s) of the fragment to be amplified. For instance, in 
the case where a nucleic acid sequence is inferred from 
protein sequence information a collection of primers 

20 containing sequences representing all possible codon 

variations based on degeneracy of the genetic code will be 
used for each strand. 

The polynucleotide primers may be prepared using any 
suitable method, such as, for example, the phosphotriester and 

2 5 phosphodiester methods, or automated embodiments thereof. In 
one such automated embodiment diethylphosphoramidites are used 
as starting materials and may be synthesized as described by 
Beaucage et al. (1981) Tetrahedron Letters 2Zl 1859. One 
- method for synthesizing oligonucleotides on a modified solid 

30 support is described in U.S. Pat. No. 4,458,066. It is also 
possible to use a primer which has been isolated from a 
biological source (such as a restriction endonuclease digest 
or the like. 

The specific nucleic acid sequence is produced by 
35 using the target polynucleotide containing that sequence as a 
template- If the targ t polynucleotide contains two strands, 
it is necessary to separate the strands of the nucleic acid 
before it can be used as the template, either as a separate 
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step or simultaneously with the synthesis of the primer 
extension products. This strand separation can be 
accomplished by any suitable denaturing method including 
physical, chemical or enzymatic means. One physical method of 
5 separating the strands of the polynucleotide involves heating 
the polynucleotide until it is substantially denatured. 
Typical heat denaturation may involve temperatures ranging 
from about 80** to 105° C for times ranging from about 10 
seconds to about 10 minutes or more. Strand separation may 

10 also be induced by an enzyme from the class of enzymes known 
as helicases or the enzyme RecA, which has helicase activity 
and in the presence of rATP is known to denature DNA. The 
reaction conditions suitable for separating the strands of 
polynucleotides with helicases are described by Cold Spring 

15 Harbor Symposia on Quantitative Biology, Vol. XLIII "DNA: 

Replication and Recombination" (New York: Cold Spring Harbor 
Laboratory, 1978), B. Kuhn et al., 

"DNA Helicases", pp. 63-67, and techniques for using RecA are 
reviewed in C. Radding, Ann. Rev. Genetics, 16:405-37 (1982). 
20 PCR synthesis can be performed using any suitable 

method. Generally it occurs in a buffered aqueous solution, 
preferably at a pH of 7-9, most preferably about 8. The 
bivalent primer (s) is/are added in suitable amounts (molar 
ratio to target) , typically less than conventional PCR methods 
25 because of the self -priming nature of the overlapped 

concatemers. The deoxyribonucleoside triphosphates dATP, 
dCTP, dGTP and TTP are also added to the synthesis mixture in 
adequate amounts and the resulting solution is heated to about 
85«-l00** C. for from about 1 to 10 minutes, preferably from 1 
30 to 4 minutes. After this heating period the solution is 

allowed to cool to from 20O-40' C, which is preferable for the 
primer hybridization. To the cooled mixture is added an agent 
for polymerization, and the reaction is allowed to occur under 
conditions known in the art. This synthesis reaction may occur 
35 at from room temperature up to a temperature above which the 
agent for polymerization no longer functions efficiently. 
Thus, for example, if DNA polymerase is used as the agent for 
polymerization, the temperature is generally no greater than 
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about 45° C, The agent for polymerization may be any compound 
or system which will function to accomplish the synthesis of 
primer extension products, including enzymes. Suitabl enzymes 
for this purpose include, for example, E. coli DNA polymerase 
5 I, Klenow fragment of E. coli DNA polymerase I, T4 DNA 

polymerase, other available DNA polymerases, reverse ^ 
transcriptase, and other enzymes, including heat-stable 
enzymes, which will facilitate combination of the nucleotides 
in the proper manner to form the primer extension products 

10 which are complementary to each nucleic acid strand, 

Generally, the synthesis will be initiated at the 3' end of 
the primer and proceed in the 5' direction along the 
template strand, until synthesis terminates, producing 
molecules of different lengths. 

15 The newly synthesized strand and its complementary 

nucleic acid strand form a double-stranded molecule which is 
used in the succeeding steps of the process. In the next step, 
the strands of the double-stranded molecule are separated 
using any of the procedures described above to provide 

20 single-stranded molecules. 

The steps of strand separation and extension product 
synthesis can be repeated as often as needed to produce the 
desired quantity of the specific nucleic acid sequence. The 
amount of the specific nucleic acid sequence produced will 

25 accumulate in an exponential fashion and the average size of 
the product will also increase as the length of the 
concatemers increases with each cycle. 

The method herein may also be used to enable 
detection and/or characterization of specific nucleic acid 

30 sequences associated with infectious diseases, genetic 
disorders or cellular disorders such as cancer, e.g., 
oncogenes. Amplification is useful when the amount of nucleic 
acid available for analysis is very small, as, for example, in 
the prenatal diagnosis of sickle cell anemia using DNA 

35 obtained from fetal cells. 

Continuous Multiplex Amplification 
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Continuous multiplex amplification can be used to 
amplify by any suitable amplification method, typically by 
PGR, a plurality of unlinked or distantly linked 
polynucleotide sequences. Certain genetic diagnostic tests 
5 require amplification of multiple segments (e.g., exons) of a 
gene. Each segment is typically amplified in a separate 
amplification reaction. Unfortunately, it is generally 
difficult or impossible to amplify each segment in 
approximately equimolar ratios due to differences in priming 
10 efficiency, length of extension, secondary structure, or other 
factors which affect amplification rate. In continuous 
multiplex amplification, the amplification reactions can be 
run together in a single reaction vessel using a common pool 
of reagents where the unlinked (or distantly linked) sequences 
15 become part of the same amplification product, which affords 
the substantially equimolar amplification of the unlinked (or 
distantly linked) sequences. An embodiment of the invention 
is illustrated schematically in Figs. 9A-9C in the case of 
amplifying two unlinked sequences represented in the double- 
20 stranded polynucleotide fragments ABC/A'B'C* and DEF/D'E'F*. 
primers C'X'D*, F'X'A', FYA, and CYD are added and annealed to 
the denatured polynucleotide fragments; the primer 
concentrations are typically lower than conventionally used 
for PGR primers. X and Y, and their complements X' and Y* are 
25 generally predetermined sequences which are selected to 
destabilize the primer : primer hybrids CYD/C'X'D' and 
FYA/F'X'A', such as by having the X and Y sequences (and their 
complements) lack substantial sequence identity. After 
extension with a polymerase, the following products and their 
30 complements result: ABCXD, DEFXA, FYABC, and CYDEF. A variety 
of hybrid combinations of product : product and primer : product 
can form and after another round of amplification a variety of 
amplification products result. Each of the pairs is capable 
of self-priming with its complement or with the complement of 
3 5 another fragment which has a complementary sequence. Through 
multiple cycles of amplification, the initial primer 
population becomes depleted and primarily extended products 
remain. These extended products will prime each other and 
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generate increasingly longer amplification products which 
contain the initial unlinked (or distantly linked) s qu nces 
in equal amounts. After completion of amplif ication^ several 
options can be pursued; the amplification product (s) can be 
5 used directly, the X and/or Y sequences can contain 

restriction sites (preferably unique sites) to allow digestion 
with the restrictino enzyme and, if desired, separation and/or 
purification of the two (or more) originally unlinked 
sequences. Alternatively, or in combination, transcription 

10 promoters (e.g., T3 and T7) can be included in X and/or Y 
sequences to facilitate transcription of the amplified 
sequences. Figs. lOA-lOC show a linear format of the 
continuous multiplex amplification method • Figs. IIA-IIC show 
an embodiment of circular continuous multiplex amplification 

15 wherein bivalent primers contain T3 and T7 promoters and the 
functional promoter sequences are thereby introduced into the 
amplification product(s). 

The following examples are given to illustrate the 
invention, but are not to be limiting thereof. 

20 

EXPERIMENTAL EXAMPLES 

The following examples are offered by way of example 
and not by way of limitation. Variations and alternate 
embodiments will be apparent to those of skill in the art. 

25 

Example 1. LacZ alpha gene reassembly 

This example shows that small fragments having 
overlapping regions of homology can be amplified and 
reassembled by PGR amplification methods in the absence of any 
30 primer. 

1) Substrate preparation 

The substrate for the reassembly reaction was the dsDNA 
polymerase chain reaction ("PGR") product of the wild-type 
3 5 LacZ alpha gene from pUG18. (Gene Bank No. X02 514) The primer 
sequences were 5 ' AAAGCGTGGATTTTTGTGAT3 • (SEQ ID N0:1) and 
5'ATGGGGTTCCGCGCACATTT3 ' (SEQ ID NO: 2). The free primers were 
removed from the PGR product by Wizard PGR prep (Promega, 
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Madison WI) . according to the manufacturer's directions. The 
removal of the free primers was found to be important. 

2) DNAsel digestion 

About 5 Mg of the DNA substrate was digested with 0.15 
5 units of DNAsel (Sigma, St, Louis MO) in 100 fil of (50 mM 
Tris-HCl pH 7.4, 1 mM MgClj) , for 10-20 minutes at room 
temperature. The digested DNA was run on a 2% low melting 
point agarose gel. Fragments of 10-70 basepairs (bp) were 
purified from the 2% low melting point agarose gels by 
10 electrophoresis onto DE81 ion exchange paper (Whatman, 

Hillsborough OR) . The DNA fragments were eluted from the 
paper with 1 M NaCl and ethanol precipitated. 

3) DNA Reassembly 

15 The purified fragments were resuspended at a 

concentration of 10 - 30 ng/Ml in PGR Mix (0.2 mM each dNTP, 
2.2 mM MgClj, 50 mM KCl, 10 mM Tris-HCl pH 9.0, 0.1% Triton X- 
100, 0.3 Ml Taq DNA polymerase, 50 ^1 total volume). No 
primers were added at this point. A reassembly program of 
20 94<'C for 60 seconds, 30-45 cycles of [94°C for 30 seconds, 50- 
55»C for 30 seconds, 72**C for 30 seconds] and 5 minutes at 
72 was used in an MJ Research (Watertown MA) PTC-150 
thermocycler. The PGR reassembly of small fragments into 
larger sequences was followed by taking samples of the 
25 reaction after 25, 30, 35, 40, and 45 cycles of reassembly. 

Whereas the reassembly of 100-200 bp fragments can yield 
a single PGR product of the correct size, 10-50 base fragments 
typically yield some product of the correct size, as well as 
- products of heterogeneous molecular weights. Most of this 
30 size heterogeneity appears to be due to single-stranded 

sequences at the ends of the products, since after restriction 
enzyme digestion a single band of the correct size is 
obtained. 

35 4) PGR with primers 

After dilution of the r assembly product into the PGR Mix 
with 0.8 MM of each of the above primers (SEQ ID Nos: l and 2) 
and about 15 cycl s of PGR, each cycle consisting of (94°G for 
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30 seconds, 50**C for 30 seconds and 72**C for 30 seconds), a 
single product of the correct size was obtained • 

5) Cloning and analysis 
5 The PGR product from step 4 above was digested with the 

terminal restriction enzymes BamKl and rco0109 and gel 
purified as described above in step 2. The reassembled 
fragments were ligated into pUC18 digested with BamHl and 
ECO0109. E. coli were transformed with the ligation mixture 

10 under standard conditions as recommended by the manufacturer 
(Stratagene, San Diego CA) and plated on agar plates having 
100 Mg/ml ampicillin, 0.004% X-gal and 2mM IPTG. The 
resulting colonies having the HinDIII-Whel fragment which is 
diagnostic for the ++ recombinant were identified because they 

15 appeared blue. 

This Example illustrates that a 1.0 kb sequence carrying 
the LacZ alpha gene can be digested into 10-70 bp fragments, 
and that these gel purified 10-70 bp fragments can be 
reassembled to a single product of the correct size, such that 

20 84% (N-377) of the resulting colonies are LacZ**" (versus 94% 
without shuffling) . This principal finding is extended 
substantially in the present invention to assemble component 
polynucleotides into product polynucleotides, and the 
component polynucleotides are not limited to randomly digested 

25 fragments of a naturally-occurring gene sequence. 

The DNA encoding the LacZ gene from the resulting LacZ" 
colonies was sequenced with a sequencing kit (United States 
Biochemical Co., Cleveland OH) according to the manufacturer's 
instructions and the genes were found to have point mutations 

3 0 due to the reassembly process (Table 1) . 11/12 types of 
substitutions were found, and no frameshifts. 
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TABLE 1 

Mutations introduced bv mu tagenic shuffling 

Transitions Frequency Transversions Frequency 

G - A 6 A - T 1 

A - G 4 A - C 2 

C-T 7 C-A 1 

T-C 3 C-G 0 

^ G - C 3 

G - T 2 

T - A 1 

T - G 2 



A total of 4,437 bases of shuffled lacZ DNA were 
sequenced. 

The rate of point mutagenesis during DNA reassembly from 
10-70 bp pieces was determined from DNA sequencing to be 0.7 % 
(N=4,473), which is similar to error-prone PGR. Without being 
limited to any theory it is believed that the rate of point 
20 mutagenesis may be lower if larger fragments are used for the 
reassembly, or if a proofreading polymerase is added. 

When plasmid DNA from 14 of these point-mutated LacZ" 
colonies were combined and again reassembled/shuffled by the 
method described above, 34% (N=291) of the resulting colonies 
25 were LacZ*, and these colonies presumably arose by 
recombination of the DNA from different colonies. 

The expected rate of reversal of a single point mutation 
by error-prone PGR, assuming a mutagenesis rate of 0.7% (10), 
would be expected to be <1%. 
30 Thus large DNA sequences can be reassembled from a random 

mixture of small fragments by a reaction that is surprisingly 
efficient and simple. One application of this technique is 
the recombination or shuffling of related sequences based on 
homology. A second application is the assembly of a large 
35 product polynucleotide by PGR amplification of component 

polynucleotides (oligonucleotides) having overlapping regions 
of homology to form annealed joints during PGR amplification. 

Y^^^^p} ^ ?- One-S ^-'> P Circular Plasmj^ Assembly From 
4 0 Oligonu cleotides 

This example demonstrates assembly of a 2.71 kb 
plasmid pl82SfiI (Stemmer (1994) llature 370: 389) which 
encodes the gene and promoter region for R-TEMl ^-lactamase. 
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A collection of 132 component oligonucleotides, each 40 bases 
in length, as well as one 56-mer and one 47-jner f see . Fig. 5A- 
E) were synthesized and used to assemble the plasmid by end- 
complementary polymerase reaction (ECPR) employing the 
5 overlapping ends of the oligonucleotides. This collection of 
component polynucleotides collectively encode the plasmid 
pl82Sfi'I. The plus strand and the minus strand were each 
initially directed by oligonucleotides 40 nucleotides long 
which, upon assembly, overlapped by 20 nucleotides (Fig. 6) . 

10 The oligonucleotides were synthesized and 5 *"phosphorylated 
simultaneously on a 96-well parallel-array DNA synthesizer 
using standard phosphoramidite chemistry. After cleavage from 
the solid support and deprotection, the dried down 
oligonucleotides were resuspended in distilled water and used 

15 without further purification. 

The oligonucleotides were diluted to a final 
concentration (all oligos combined) of 1/iM (14 ng/Ml) in 20 ixl 
of GeneAMP XL PGR Mix (Perkin-Elmer , Branchburg, NJ; 0.2mM 
each dNTP, 2.2 mM MgCl2/ 50 mM KCl, 10 mM Tris-HCl pH 9,0, 

20 0.1% Triton X-100) . An aliquot of the reaction mix (A) was 
electrophoresed on an agarose gel (Fig, 7) . The reaction was 
started with 5 m1 of a 50:1 (v/v) mixture of Taq pol^nnerase 
(Promega, Madison, WI) and Pfu polymerase (Stratagene, La 
Jolla, OA) such that 1 unit of Taq and 0.02 unit of Pfu 

25 polymerase were added. The PGR program consisted of the 
following program: 40*>C for 2 minutes, 72**C for 10 seconds, 
then 40 cycles of (94**G for 15 seconds, 40**G for 30 seconds, 
and 72*C for [10 seconds 1 second/cycle]). An aliquot of 
the resulting reaction product (B) was electrophoresed on an 

30 agarose gel (Fig. 7), the remainder was then diluted 3x with 
XL PGR Mix and enzyme and amplified with the following 
program: 25 cycles of (94**C for 15 seconds, 40»G for 30 
seconds, and 72<»C for [45 seconds 1 second/cycle]). An 
aliquot of the resulting reaction product (G) was 

35 electrophoresed on an agarose gel (Fig. 7) and the remainder 
then diluted 3x with XL PGR Mix and enzyme and amplified with 
the following program: 20 cycles of (94°C for 15 seconds, 40«C 
for 30 seconds, and 72*G for [70 seconds + 1 second/cycle]). 
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An aliquot of the resulting reaction product (D) was 
electrophoresed on an agarose gel (Fig. 7). 

The an aliquot of the reaction product (D) was 
electrophoresed on an agarose gel, as was an aliquot which was 
5 digested with BamHI, which verified assembly of large DNA 

molecules consistent with formation of large concatemers which 
was resolved to unit length with BamHI digestion. The PGR 
product obtained by this method was shown to be concatemeric, 
and was resolvable by BamHI digestion into a single 2.7 kb 

10 band by agarose gel electrophoresis. 

Aliquots of reaction product (D) were digested with 
various restriction enzymes (shown in Fig. 7) . Fig. 7 shows 
that the unique cutters EcoRI and BamHI liberated a 2.71 kb 
fragment consistent with the size of a complete 2.71 kb 

15 plasmid pl82SfiI (Stemmer (1994) Nature ilfi: 389, incorporated 
herein by reference) . Furthermore, the digestion results with 
Ncol, Sfil, PstI, and Bglll all yielded fragments consistent 
with the restriction map of the complete 2.71 kb plasmid 
pl82Sf il. 

20 After digestion of the PGR product with BamHI, the 

2.7 kb fragment was gel purified and ligated with ligase, and 
transformed into E- coli.and transformed into E. coli K-12. 
Tetracycline-resistant transf ormants were selected. 

This example demonstrates that the circular DNA 

25 assembly method allows for rapid and inexpensive construction 
of long DNA sequences, such as genes, gene libraries, 
plasmids, viral genomes, etc. The assembly method facilitates 
several mutagenesis approaches, such as point mutagenesis, 
combinatorial cassette mutagenesis, and doping, or mixing in 

3 0 other nucleotides during oligonucleotide synthesis. 

Deliberate modifications to the DNA sequence can be made 
simply by substituting one or more new oligos followed by 
reassembly. To reduce the rate of PGR mutagenesis during 
assembly, the addition of a proofreading polymerase can assure 

3 5 efficient long-read PGR reactions by combining high 
processivity with proofreading. 
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Example 3. Antibody Germline Assembly from Oliaos Wit h Rolling 
Circle Concate meric Amplification 

A scFv antibody with germline sequences (Vjj251 and 
Vj^25) was constructed from 19 oligonucleotides by cyclized 
5 assembly • The oligos were at 2-20 ng per ^1 in PGR Mix. the 
program was 20 cycles of (94 ''C for 15s, 48**C for 30s, 72 *C for 
30+ls/cycle) . The size of the product of this reaction was 
200-500 bp. The PGR product was diluted 4-fold in PGR Mix and 
PGR was run for 24 cycles of (94*'G for 15s, 55*>G for 30s, 72**G 

10 for 3 0+8s/cycle) , followed by one additional 3-fold dilution 
and 20 cycles of (94*»G for 15s, 55"G for 30s, 72*»C for 
30+8s/cycle) . The product was >50 kilobases, and after 
digestion with Sfil and NotI resulted in a single DNA fragment 
of the correct size, 

15 Fig. 8 shows a schematic for end-complementary 

polymerase reaction (EGPR) in conjunction with parallel- 
processing PGR to amplify very large polynucleotides, such as 
those larger than can be amplified reliably by conventional 
PGR using only a single primer set. 

20 

Example 4, Plasmid Asse mbly With Rolling Gircle Goncatemeric 
Amplification 

pGJ103 is a 5.5 kilobase plasmid containing an 
25 intact ars operon (Ji and Silver (1992) Proc. Na tl. Acad. Sci. 
fUSA) 89: 9474) . 

In one example, pGJ103 was digested with DNAsel into 
random 100-400 bp fragments which were reassembled by circular 
shuffling in PGR Mix with a program of 50 -cycles of (94<>G for 
30 15s, 68°G for 30s+8s/cycle) , using three different 

concentrations of fragments. Each reassembly yielded a 
product of >50 kb which was digested with BamHI to yield a 
single band of the correct (predicted) size, which was 
ligated, transformed into E. coli, and preplated on increasing 
35 levels of arsenate to select for up-mutants. 

Gells and plasmids . Plasmid pGJ103 is pUC19 derivative 
containing the 2.5 kb arsenic resistance operon from S. aureus 
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plasmid pI258. E. coli strain TGI was obtained commercially 
(Pharmacia, Tarrytown, NJ) • Sodium arsenate (Sigma) was used 
as a 2.5M stock solution, neutralized to pH 7 with NaOH. 
Selection for increased arsenate resistance was performed at 
5 35*>C on agar plates with LB medium (Life Technologies) 
containing varying concentrations of arsenate. 

The 5.5 kb plasmid pGJ103 was fragmented by 
sonication into fragments of 4 00-1500 bp^ and reassembled by 
PGR using Perkin Elmer XL-PCR reagents with 10% PEG-6000, 

10 using PGR without added primers. The PGR program for the 

assembly was SO'^C 30 s, than 60 cycles of: 94*C 20 s, 40-45*»C 
30 s, 72*'G 39s + 1 s per cycle in a PTG-150 minicycler (MJ 
Research, Watertown, MA). The PGR process yielded plasmid 
multimers of about 15 to 40 kb in size, that were digested 

15 into 5.5 kb monomers with the restriction enzyme BamHI, which 
has a single unique site in plasmid pGJ103. The 5.5 kb 
plasmid monomer was purified from an agarose gel after 
electrophoresis, and after self-ligation, it was 
electroporated into electrocompetent E. coli TGI cells. 

20 

Arsenate resistance selection . Transformed E. coli cells were 
plated on LB plates containing a range of concentrations of 
sodium and arsenate and incubated at 37 »C for 24 hrs., and at 
least 1,000 colonies from the plates with the highest arsenate 

25 levels were pooled by scraping the plates. The harvested 
cells were grown in liquid in the presence of the same 
concentration of arsenate as in the petri dish and a plasmid 
pool was prepared from this liquid culture. Rounds 2-4 were 
identical to round 1, except that the cells were plated at 

30 higher arsenate levels. 

Arsenate resistance quantification . Induced inoculum cells of 
E. coli TGl{pGJ103), wild type ars operon, and TGI with mutant 
PGJ103 plasmid pools were grown overnight at 37 in 2 mM or 
35 50 mM arsenate, respectively. Equal amounts of cells (by 
turbidity as OD600nm) were on plates containing a range of 
concentrations of arsenate and grown for 18 hrs. at 37»G. 
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Cell growth was quantitated by r suspending the cells and 
measuring the OD600nin. 

Arsenate detoxification assay . The ability of E. coli 
5 constructs to detoxify arsenate was measured by intact cell 
arsenate reduction assay using radioactive 73As043-as 
substrate and separation of arsenate and arsenite of by thin 
layer chromatography, followed by quantitation in an Ambis 
radioactive counter. 

10 

DNA sequencing . The sequence of the entire operon after 
selection was determined by dideoxy DNA sequencing using 
fluorescent terminating substrate and an ABI sequencer. 

15 Results and discussion . The wild type plasmid pGJ103 ars 

operon confers on E. coli strain TGI resistance to up to 4 mM 
arsenate when grown on LB plates at 37*>C for 24 hrs. 
Selection round one, which was plated on 2 , 4, 8, 16 and 3 2 mM 
arsenate, yielded about 2,000 colonies growing at 16 mM 

20 arsenate. Selection round two was plated 16, 32, 64 and 128 
mM arsenate and yielded about 4,000 colonies growing at 64 mM 
arsenate. Round three was plated at 64, 128 and 256 mM 
arsenate and yielded about 1,500 colonies at 128 mM arsenate, 
and round 4 was plated on 128, 256 and 512 mM arsenate. 

25 Colonies were harvested from the plates with 256 mM arsenate 
and replated on 200, 300 and 400 mM arsenate. Single colonies 
from plates with 400 mM arsenate were grown in liquid culture 
with 400 mM arsenate and frozen at -70**C, and used for further 
characterization. Resistance levels were increased by DNA 

30 shuffling to arsenate (as selected) and also to arsenite and 
antimony salts (Fig. 12) , which are the two toxic oxyanions to 
which resistance requires the ArsB membrane transporter but 
not the ArsC arsenate reductase enzyme. In this growth 
experiment, done with the pool from three cycles of DNA 

35 shuffling (which retained good growth in LB broth) , not only 
was growth clear about 100 mM As043-, but increased resistance 
to arsenite (As02-) and antimony (Sb03+) was clearly=20 shown. 
These results require mutational effects beyond those possibly 
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limited to the arC gene, which affects resistance to arsenate 
alone. 

Chromosomal integration . Cells selected and grown at and 
above 128 mM arsenate resulted in smaller growth, lower cell 
growth yields, and in low and variable plasmid yields. 
Plasmids- were isolated that had apparently lost the arsenate 
operon, and most cells showed a complete loss of plasmids. It 
appeared that the DNA shuffling plus selection for high 
arsenate resistance resulted in integration of the ars operon 
into the E. coli chromosome, since the ars operon could be 
recovered from chromosomal DNA of clones which had lost the 
entire plasmid by conventional PCR amplification with 
"upstream" and "downstream" oligonucleotide primers. 

Integration mechanism . The arsenate resistance operon of 
plasmid pGJ103 is flanked on both sides by 200 bp inverted 
homologous regions, which appear to be the terminal portions 
of site-specific recombinase genes. Attempts to recover the 
operon from the total cellular DNA of highly resistant cells 
by PCR showed that oligonucleotide primers near the inside 
ends of the recombinanse genes, immediately flanking the 
arsenate genes, yielded a PCR product of the correct size (2.0 
kb) and with the expected restriction nuclease site pattern. 
However, primers located toward the middle or near the outside 
ends of the 200 bp homologous sequences did not yield 
predicted PCR products with the intact ars operon. 
Presumably, chromosomal integration was selected because the 
integrated operon somehow resulted in increased arsenate 
resistance and the homologous sequences at the ends of the ars 
operon facilitated chromosomal integration by recombination. 

Chromosomal ars operon . The chromosome of E. coli normally 
contains an arsenate resistance operon which is distantly 
related to the pI258 operon and results in a low level of 
arsenate resistance. The operon which was recovered from the 
chromosome of highly resistant cells by PVR was shown by 
restriction mapping and by DNA sequencing to be derived from 
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the pI258 operon, and not from the E. coli K-12 chromosomal 
operon . 

Chromosoma 1 shuf f 1 ina * Because the cells recovered from 128 
5 mM arsenate did not contain plasmxd DNA, the shuffling for 
round 4 was performed on the PGR product which was obtained 
from the chromosomal DNA of the cells selected in round 3, 
This PGR product was combined with a 10-fold lower molar 
amount of the plasmid DNA obtained from round 2 cells, and the 
10 mixture was fragmented, shuffled and selected as for earlier 
rounds . 

Cloning and characterization of the integrated operon . The 
conventional PGR product which was obtained from the 

15 chromosomal DNA of cells grown at 400 mM arsenate was cloned 
into the polylinker site of pUG19. This construct was similar 
to pGJ103 except that it lacked the 200 bp inverted homologous 
DNA flanking the arsenate operon. Gells containing this 
plasmid were resistant only up to about 10 mM arsenate. The 

20 reason for this loss of arsenate resistance level is not 
known. The DNA secfuence of this cloned chromosomal operon 
showed thirteen base changes relative to the original 
sequence. The arsR gene contained two silent mutations (T389C 
and T429G. The arsB gene contained ten base changes, and one 

25 base change occurred in the non-coding area past the end of 
the arsG gene (G2469G) . Of the ten base changes in arsB, 
three resulted in amino acid alterations: base T1281C change 
resulted in amino acid change L232S, base T1317C change 
resulted in amino acid change F244S, and base Ti853C change 

30 resulted in amino acid change y423H, all three involving a 
change toward a more hydrophilic residue via a T to G 
transition. The seven silent mutations were T961G, A976G, 
T1267G, A1402G, T1730G, T1819G and T1844C. 

35 Arsenate reductase activity . The activity of arsenate 

reductase by whole mutant cells after the third cycle was 
increased about 50-fold (Fig. 13) to the wild type initial 
strain with plasmid pGJ103. This increase in whole cell 
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reductase rate appeared to be more dependent on an increase in 
rate reduction and not on an enhanced affinity of arsenate for 
the cells (data not shown) • This is consistent with the 
finding that the mutations occurred in the efflux transport 
5 protein and not in the arsenate reductase itself. 

Although the present invention has been described in 
some detail by way of illustration for purposes of clarity of 
understanding, it will be apparent that certain changes and 
10 modifications may be practiced within the scope of the claims. 
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CLAIMS : 

1. A method for amplifying a target 
polynucleotide , comprising : 
5 contacting under conditions suitable for PGR, a 

target polynucleotide with a bivalent primer which comprises 
two portions of complementarity to the target polynucleotide: 
(1) a first portion which is in the 5' portion of the primer 
and which is substantially complementary to a sequence in the 

10 3' portion of the sequence to be amplified (target sequence) 
in the target polynucleotide, and (2) a second portion which 
is in the 3 ' portion of the primer and which is substantially 
complementary to a sequence in the 5* portion of the sequence 
to be amplified (target sequence) in the target 

15 polynucleotide ; 

catalyzing under suitable reaction conditions for 
PGR, polynucleotide synthesis primed from the 3*-hydroxyl of 
the annealed bivalent primer to form a strand complementary to 
the target sequence, thereby forming a nascent complementary 

20 strand; 

denaturing the target polynucleotide and the nascent 
strand and allowing reannealing with another molecule of the 
bivalent primer species or with a complementary strand of a 
target polynucleotide or an amplified copy thereof; and 
25 repeating an elongation/denaturation/reannealing 

cycle from 1 to about 100 times as desired, resulting in 
formation of amplified product which comprises head-to-tail 
concatemers of the target sequence, 

30 2. The method of claim 1, comprising the further 

step of cleaving said concatemers with a restriction 
endonuclease which cuts within each concatemeric unit to form 
a population of polynucleotides each consisting of an 
amplified target sequence. 

35 

3. The method of claim 2, comprising the further 
step of ligating the population of polynucleotides each 
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consisting of an amplifi d target sequence with DNA ligase to 
form covalently closed circles. 

4. The method of claim 2, comprising the further 
5 step of ligating the population of polynucleotides each 
consisting of an amplified target sequence by direct 
transformation into host cells for in vivo ligation. 



5. A method of claim 1, comprising the further 
10 step of annealing, under dilute conditions suitable for 

substantial intramolecular annealing and circle formation, the 
nascent strand with a complementary strand of a target 
polynucleotide or an amplified copy thereof to form 
amplification intermediates in the form of cyclized DNA as a 

15 result of the 3 * terminus of an overlapped nascent strand 
annealing to the 3* terminus of an overlapped complementary 
strand which has a strand with an extendable 3'-hydroxyl which 
can be extended with a DNA polymerase substantially lacking 
exonuclease activity whereby the leading terminus of the 

20 nascent strand continually displaces the lagging portion of 
the nascent strand producing a concatemeric single strand 
emanating from the amplification intermediate. 

6. A method for assembling a polynucleotide from a 
2 5 plurality of component polynucleotides, comprising: 

a first step comprising contacting a plurality of 
strands of alternating polarity which comprise substantially 
complementary termini to form overlapped annealed joints in a 
reaction with a polynucleotide polymerase under suitable 

30 reaction conditions thereby catalyzing strand elongation from 
the 3'-hydroxyl portions of the overlapped joints, filling in 
a portion between said overlapped joints and processively 
displacing or processively degrading exonucleolytically the 5' 
termini of downstream component strands of the same polarity 

35 as the nascent strand elongates forming double-stranded 
polynucleotides; 

a second step comprising denaturing of th double- 
stranded polynucleotides, followed by altering the reaction 



10 
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conditions permitting reannealing of complementary strands or 
portions thereof to form concatemers having overlapped joints, 
and permitting chain elongation by a polynucleotide polymerase 
from the 3'-hydroxyl portions of the overlapped joints; and 
repeating said second step from 1 to 100 times to 
generate a double-stranded concatemer of a desired size* 

7. The method of claim 6, wherein the component 
polynucleotides are in single-strand form. 

8. The method of claim 6, wherein the polymerase 
is Taq polymerase, TthI polymerase or Vent polymerase. 

9. The method of claim 6, wherein the step of 
15 denaturation comprises elevating the temperature of the 

reaction to at least 94 °C. 

10. A kit comprising a bivalent primer and 
instructions for performing the method of claim 1. 

20 

11. A method for producing a polynucleotide by 
overlap assembly of parallel polymerase chain reaction 
amplifications, comprising: 

forming at least three overlapping polynucleotides, 
25 wherein the 3' terminus of a first single-stranded 

polynucleotide is substantially complementary to the 3* 
terminus of a third single-stranded polynucleotide of the 
opposite polarity, and wherein the 5* terminus of said second 
single-stranded polynucleotide is substantially complementary 
3 0 to the 3' terminus of a third single-stranded polynucleotide 
having polarity identical to said first single-stranded 
polynucleotide, thereby generating an overlapped 
polynucleotide set capable of chain elongation by a suitable 
polymerase to generate a double-stranded product spanning the 
35 three initial overlapped polynucleotides; 

synthesizing complementary strand polynucleotide 
sequences catalyzed by a DNA polymerase to form a double- 
stranded product spanning the overlapped polynucleotide set. 
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12. Th method of claim 11, wherein a plurality of 
primer sets are employed in a single reaction, each primer set 
priming the PCR amplification of a polynucleotid sequence 
which comprises terminal sequences which are complementary to 
terminal sequences in at least one other amplification product 
produced by a different primer set, thus generating a set of 
overlapp-ing PCR products with which an overlapped 
polynucleotide set spanning the entire set of PCR products is 
generated and complementary sequences synthesized by 
polymerase to form a double-stranded polynucleotide spanning 
the entire set of overlapping PCR products. 

13. The method of claim 12, comprising the further 
step of circularizing the double-stranded polynucleotide 
spanning the entire set of overlapping PCR products and 
ligating with ligase in vitro or transforming directly into 
suitable host cells. 

14. The method of claim 12, wherein the double- 
stranded polynucleotide spanning the entire set of overlapping 
PCR products is at least 100 kilobases. 

15. A method of continuous multiplex amplification 
wherein a plurality of initially unlinked polynucleotide 
sequences are amplified with at least two bivalent primers 
forming an amplifcaiton product comprising an equimolar ratio 
of the initially unlinked polynucleotide sequences and 
complementary ends. 

16. The method of claim 15, wherein the multiplex 
amplification method is circular multiplex PCR amplification. 

17. The method of claim 15, wherein the multiplex 
amplification method is linear multiplex PCR amplification. 



18. The method of claim 15, wherein the multiplex 
amplification method results in amplification prosucts 
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comprising at least one prokaryotic or bacteriophage promoter 
sequence. 
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CTC ACO WA AOO GXT TIT OCT CXT OAC Art ATC KXA AAA C 




AlC TTC ACC TAG ATC CTT TTA AAT TAA AAA TQA ACT TTT A' 




AAT CAA TCr AAA OTA TAT ATO ACC CCT QAC AOO CCC CTC T 


Hi 


GAC ACT TAC CAA TCJC TTA ATC ACT GAC CCA CCT ATC TCA C 


Bi 


CCA TCT CTC TAT TTC GTT CAT CCA «W TTC CCT CAC TCC C 


ft< 


COT CCT OTA CAT AAC TAC OAT ACO OCA oca err ACC ATC T 


1* ' 


COC CCC ACT OCT CCA ATO ATA CCC COA GAC CCA COC TCA C 




CCC CTC CAC ATP TAT CAO CAA TAA ACC AOC CAC CTO CAA 0 - 




CCCCaACCGCAOAAOTOCTCCTCCAACTTTATCCCCCTCC 


RIO 


ATC CAC TCT ATT AAT TOT TCC CCC CAA OCT ACA CTA ACT A - 


mx 


OTT CCC CAO TTA ATA CTT TOC CCA ACC TTG rrC CCA TCC C ■ 


R12 


TAC AGO CAT COT OCT CTC ACC CTC CTC OTT TCC AAT CCC T 


R13 


TCAnCAOCTCCOCyrTCCCAACOATCAAOOCOACTTACAT 


IU4 


OATCCCCCATGTTOTCCAAAAAAOCCCTTACCTCCTTCCO 


RIS 


TCC TCC CAT COT TCT CAC AAC TAA CTT OCC TQC ACT CTT A 


R16 


ICA CTC ATC CTT ATC CCA CCA CTO CAT AAT TCT CTT ACT 0 


RX7 


TCA TOC CAT CCO TAA OAT OCT TTT CTO TCA CTO CTC ACT A 


R18 


CTC AAC CAA OTC ATT CTC AOA ATA OTO TAT OCO CCfl ACC C 


RX9 


ACT TOC TCT TOC CCC OCO TCA ATA COO CAT AAT ACC OCO C 


S20 


CAC ATA OCA OAA err TAA AAO TOC TCA TCA TTO OAA AAC 0 


lax 


nCTTCOOCOCOAAAACTCTCAAOGATCTTACCOCTOTTC 




ACATCCACTTCOATOTAACCCACTCCTOCACCCAACTOAT 


I3S 


CTrCAOCATCTTTTACTTTCAeCACCCTTPCTOOOTOWSC 




AAAAACAOOAAOCCAAAATOCCCCAAAAAAOCCAATAAaO 


JUS 


OCC ACA COC AAA TCT TOA ATA CTC ATA CTC TTC CTT TTT C 




AAT ATT ATT CAA OCA TTT ATC AC? OTT ATT CTC TCA TQA 0 


Ra7 


COC ATA CAT ATT TCA ATO TAT TTA CCC CAT OOT OOC CAA A 


R38 


AAT AAA CAA ATA ceo OTT CCC CCC Awl TTT CCC CCA AAA^C 



- — .A 
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1 






TIC rrx <uc OTC Aoo 


2 






GQk ACC CCT XTT TOT 


3 






TAC ATT CAA ATA TOT 


4 






ATA AAT CCT TCA ATA 


5 






ATT CAA CAT TIC CCT 


6 






CAT TIT OCC TTC CTO 


7 






OAA ACT AAA AOX TGC 


a 






OCT TAC ATC GAA CTG 


9 






9*wp^ ff^ft 

ASA GTT TTV CiCt Cw 


10 






ipm* li^i* ^^^P 
VT* TAA AmT *Vv w^T 


11 






/•V^ ff/*^ ft/V5 /^l/^ 
GAC UCC wM» WAA UAM 


13 






CTC ACA ATC ACT T\m 


13 






iivxp AAA <l\OA 
OCA TVT TAL StaA 1>V« 


14 






CCT CCC ATA ACC ATG 


IS 






TTC TOA CAA COA TCO 


16 






TTT OCA CAA CAT CGC 


17 






Ttaa OAA CCO <2A3 CTO 


1 fl 






CTO ACA CCA COA TOC 


19 






CAA ACT ATT AAC TOO 


30 






CAA CAA TT A ATA GAC 


31 






GAC CAC rrc tcc gct 


-22_ 


• 




TOC TOA TAA ATC TOO 


33 






ATC ATT OCA OCA CTO 


34 






TCO TAG TTA TCT ACA 


25 






TOA ACO AAA TAG ACA 


36 






ATt AAO CAT TOC TAA 


37 




ATA TAT ACT TTA CAT 


30 






AAA AOO ATC TAG OTC 



TOO CAC TTT TCO COG AAA TCT CCO C 
TTA TTT TTO^OCC ACC ATQ CCC TAA A 
ATC COC TCA TOA OAC AAT AAC CCT 0 
ATA TTC AAA AAC GAA GAC TAT GAG T 
CTC CCC CTT ATT CCC TTT TTT GCC C 
TTT TTO CTC ACC CAO AAA COC TOO T 
TCA ACA TCA OTT GOO TC3C ACG ACT G 
GAT CTC AAC AOC GOT AAO ATC CTT 0 
AACAACCTTrXCCAATGATOAOCAC 
ATG TOC CCC CCT ATT ATC CCC TAT T 
CAA CTC OCT COC CGC ATA CAC TAT T 
TTGACTACTCACCAOTCACACAAAA 
CAT GAC ACT AAG AGA ATT ATG CAO T 
AGTGATAACACTCCAGCCAACrrAC 
aAGGACCOAAaOAOCTAACCCCrrT 
CGATCATCTAACTCQCCTTCATCCT- 
AAT GAA OCC ATT CCA AAC GAC GAC C 
CTO TAG CCA TOO CAA CAA COT TGC 0 
CGA ACT ACT TAC TCT AOC TTC CCO 0 
TOO ATG OAC 000 OAT AAA OTT CCA G 
COO CCC TTC CAO CTO OCT OCT TTA T 
AOC COC TGA GCC TOO OTC TCG COCTT' 
OGG CCA OAT GOT AAG CCC TCC CCT A 
COA COO GOA OTC AOO CAA CCA TOO A 
OATCOCTOAaATACGTOCCTCACTQ 
CTO TCA OAC 000 CCT CTC AOO CCT C 
TOA TTT AAA ACT TCA TTT TTA ATT T 
AAC ATC CTT TTT GAT AAT CTC ATG A 
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GQMAACCCT0QCQnAC«CMC1TAATCGCCTTGCAGCA 

!3TCC«XnTra3CCAGCTOGCGTAATAQCOAAGAGSCCC 

cSAaXSATCaCCCTTCCCMCAfiTTGCGTAa(XTGAATG 

CGMTGGCGCCTtUTeCGGTATrnCTCCTTACQCATCTG 

TOCQGTATTTCACACCGCATATGaTGCACTCTCAGTACAA 

tSgctctqatgcxgcatagttaagccagccccqacaccc 
g^l^cacc«gct6acocgccctgacg0gcttgtctgctc 

cScATCCGCTTACAGACyw^GCTGTGACXGTCTaXi 

GCTOCATCTQTCAQAGGTTTTCACCGTCATCACCGAAACG 

CGAGGCCCnrrrCGTCTCGCGCGTTTCGGTGATQACGGTGA 

SSUcCTCTQACACATGCAGCTCCCGGAGACGGTCACAGCT 

TG^rrOTAAGCGGATOCCGGGAGCAGACAAGCCCGTCAGG 

GCQ^TCAGCGGGTGTrGGaSGGTGTUiGGGCTGGCTTAA 

?rATGCGGCATCAQAGCAGATrGTACTGAGAGTGCACCAT 

ATOCGQTGTGAAATACCGCACAQATGCGTAAGQAQAAAAT 

iScGCATCAGGCGCCATTCGCCATTCAGGCTACGCAACTG 

SSL3AAGGGCGATCaGTGCXKKiCCTCTTa3CTATrAM 

CASCTGGCaAAAQGGGGATSTGCTGCAAGQCQArrAAGTT 

MGTAACaCCAQGQTTTTCCCAQTCACGACeTTGTAAAAC 

SaSoCCAQTOCCAAOCTTGCATGCCTGCAGGnrCGACTCT 

AQAGGATCCCCOQaTACCGAGCTCOAATTCQTAATCATQQ 

TC?TAGCTQmCCTQTQTGAAATraTrATCCGCTCACAA 

S^Sacacaacatacoaoccgqaagcataaagtgtaaagc 

CTSoaeTGan-AATaAeTGAGCTAACTCACAnAATroCQ 

TTOlkcTCACTaCCCGCmCCAaTCQaQAAACCTGrT^ 

GCCAGCraCATTAATGAATaaGCCAACGCGCGGGGAGAOQ 

(SmGCGTATTGGGCXiCTCTTCCXiCrrCCTCGCTCACT 

QACTCGCTQCGCraSGTCGTTaKiCTGCGCSGA^ 

CAGCTCACTCAAAGGCGGTAATACGGTTATCCACAGAATC 
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(50H) 

AQGGGATAACGCAQQAAAGAACATQTGAGCAAAAGGCCAG 
(SIR) 

CAAAAGGCCAaOAACCGTAAAAAGGCCGCGTTGCTGGCGT 
fS2R) 

TTTTCCATAGGCTCCQCCCCCCTGACOAGCATCACAMAA 
r53R) 

TCGACGCTCAAGTCAGAGGTGGCXUkAACCCGACAGGACTA 
(54R) 

TAAAGATAC^AQGOSmCCCCCTGGAAGCTCCCTCGTGC 
(55R) 

GCTCTCCTGTTCOUCCCTGCCGCTTACCGGATACCTGTC 
(56R) 

CGC C I I l UIO CCTTCGGGAAGCGTGGCGCrrrcrCAATGC 
f57R) 

TCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGITCGCT 
{58R) 

CCAAGCTGGGCTGTGTQCACXiAACCCCCCGTTCAGCCCGA 
(MR) 

CCGCTGCGCCnATCCGGTAACTATCGTCTTGAGTCCAAC 
(60R) 

CCGGTAAGACACGACTTATCGCCACrrGGCAGCAGCCACTG 
(61R) 

GTAACAGGATTAGCAQAGCGAGGTATGTAGGCXaGTQGTAC 
(«2R) 

AQAQTTCTTGAAGTGQTGGCCTAACTACGGCTACACTAGA 
(MR) 

AQQACAGTAmGGTATCTGCGCTCTGCTGAAGCCAOTTA 

{64R} 

CCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACA 

{«R) 

AACCACCQCTGGTAGCQGTQGII I 1 1 1 1 GTrTGCAAGCAG 

(MR) 

CAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTT 
(B7R) 

TGATCi T nC I ACGGGGTCTQACGCTCAGTGGAACGAAAA 



(e7F) 

CGCGAGACGAAAQGGCCTCGTGATACGCCTATTmATAGGTTAATGTCATGA^ 
{29R) 

TGCCACCTGACGTCTAAGAAACCATTATTATCATGACATTAACCTAT 
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|29F> 

CCAAAATCCCTTMCGTGAGrrTTTOTTTCX^ 

AOACCCCGTAGAAAAGATCAAAGGATCrTCT^ 

Sr^TCTGCGCGTMTCTQCTGCnTGCAAACAAA^ 

f32F) 

CACCOCTACCAGCGaTGQTTTGTTrGCCGQATCAAGAGCT 

(33F) 

ACCAA CICi 1 1 [ i CCGAAGGTAACTGGCTrCAGCAQAGCO 

CAQATACGAAATACTGTCCTTCTAGTGTAGCCGTAGTTAG 
i2SF) 

GCCACCACTTCAAQAACTCTGTAGCACCQCCTACATACCT 
f36R 

CQCTCTGCTAATCCTGnACCAGTGGCTGCTGCCAQTQQC 

aJrLvGTCGTGTCnACCGGGTTGGACTCAAGACQATAGT 

(38F) ^ 

TACCGGATAAGGCGCAGCGGTCGGGCTGAACQGQQQQTTC 

(30F) 

QTGCACACAGCCCAGCTTGGAGCGAACQACCTACACCQAA 

CTGAGATACCTACAGCGTGAGCATTGAGAAAGCGCCACX3C 

^nnDXQAAGQGAGAAAGGCGGACAGGW 
(42R 

CAGGGTCGGAACAGGAGAGCGCACGAGQQAGCTTCCAQGQ 

QQAAACGCCTGGTATCmATAGTCCTQTCQGGTTTCGCC 

)«:TCTQACTTGAGCGTCGATTmGTC 

QQQQCGQAGCCTATGGAAAAACGCCAGCAAMC^ 

TTACTQnCCTGGCCTT^ 

iSfrTCCTGCXSmTCCCCT^ 

TACCQCCrmGAQTQAGCTGATACCGCTW 

AOGACOGAQCGCAGCGAGT^ 

!Swca:AATACGCAAACCGCCTCTCXCM 

oJItto^ttaatgcagctqgcacgacaggtt^^ 

(52R 

AAAQCaGGCAGTGAGCGCMCGCMTTAATGTQAQTTAQC 

(53p\ 

TCACTCATTAQGCACCCCAGQCTTTACACTTTATGCTTCX 

(54R 

QQCTCGTATGTTGTGTQGAATTCTQAaOTGATAApAAm 
C^SvCAGGAAAPUiCTfATQACCAJQATTAr'^AAn 
TMOTACCqaOQQATCCTCTAQAfiTCGAOCTGCAGGCATC 
cSkGCrrTGGa^CTGGCCGTCGTTTTA^ 
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134 40-mers 
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Figure 8 
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Continuous Circular Multiplex PCR 



Initial Conditions 

A B c ^ 



A- B' C D- E 



Anneal with Primers 

A B C 



F Y A^ 



C Y V 



Extend and Melt 

B c X _D L 

C' X' D' D' 

F Y A B C ^ C Y_ 

r A' B^ C* C Y' 

Figure 9A 
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C Y 
B C X 



D' 



E* F X' A' 



d;^ e' f' 

'c X' D" 



F Y A 
F X A 



B' C X' D- 



A' B' C 



F X* A' 



_A B_ 

F Y A B 



C Y D. 



C Y* D' E' F 



X^X' D' 



F Y 



A 



F Y' A' 



B' 



E* 



^ F 



F' X' A' 



Figure 9B 
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Extend 



B c X D 



X A 



J. i: — S yTo- e F X- A' 

D E FXA BOX 

-5: E' — F yTT' w c' xTd^ 

fya b cy def, ^ 

p. Y' A' B' C Y' D* E' F' 

CYDE FY ABC^ 
C- Y' D' E- F' Y- A- B- C 



Figure 9C 
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Continuous Linear Multiplex PCR 



Initial Conditions 



A B C ^ D E P ^ 

A' B' C D* E' P 



Anneal with Primers 

A B C ^ 



C* X' D' F' 
C Y D 



A' B- C* Q> E' p 



Extend and Melt 

A B C X D 



'a^ B^ X' D* D* E' F' 

A B C C Y D E 

• A- b- 7' 



Figure lOA 
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C Y 0 
B C X D 



D' E' F' 



C X' D' 



A' B' C X' D' 



c y P ^ 
A B C ^ 



C Y* D' E' F 
X* 5' 



C Y D E 



Figure lOB 



wo 96/33207 



PCTAJS96/05480 



19/24 



Extend - Atter a period, assuming equlmolar primers, 
only the following products are left. 

A B CXD E 

C' X' D' i F~ 

AB CYD EF ^ 
A' B' C Y' D' E' F' 

Amplify using only primers A and F ' 

If necessary, digest to separate the components 
using restriction sites incorporated Into X and Y 



Figure IOC 
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Initial Conditions: sequence embedded in genome DNA. indicated in 
gray 

A B C ^. D g ^ t 

^ ^ ■ 

A' B' C D' E' F 

Anneal #1 : low concentration standard regular PGR Primers. 




B 



A' 



B' C 



D" E' F' 



Figure 11 ^ 
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Anneal , with Primers: New primers are at higher concentration than initial 
primers so that there is no competition problem with these primers binding with the initial 
primers. Note that primers are selected for only one strand. The orientation is selected so 
that the resulting fragment overlap. 

A B C _ D E 



F T3'XT7 C Y D 



A' B' C 



D' E' F' 



Extend and Melt: 



T3'XT7A B CYD E 



F' T3 X* 17' A' B" C C Y' D' E' F' 



Figure 11 B 
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F 



D' 



r_ T3 X'T7' A" 
F" 



B' 



C 



Extend 



FT3'XT7A B CYD BF 



F' T3 X'T7' A' B' C Y' D* E' P 



C Y D E F T3' XT7 A B C 
C y D' E' F' T3 X'T7' A' B* C 



Figure llc_ 
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Percent Reduction of Arsinate 




Figure 13 
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(0 its right to sue for and obtain injunctive relief, damages and all other 
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International Application No. 


REQUEST 


international Filing Date 


The undersigned requests that the present 
international application be processed according to 
the Patent Cooperation Treaty. 


Name of receiving Office and "PCX International Application- 




Applicant's or agent's file reference poo4287WO ATM 
(if desired) (12 characters maximum) KUU^^o f vvvj m nv. 



Box No. I 



TITLE OF INVENTION 



NUCLEIC ACID BINDING PROTEINS 



Ibox'No. II APPLICANT 



hji;;;rand address: (Family name fo//owed by given name; for a legal entity ""^^'''tt^^^^^^^ 
a^d^ess mu^ postal code and name of country. The country of tt^e address ,nd^^^^^^^ .n th^s Box ;s 

\'he!p^^^^^^^ State (i.e. country) of residence if no State of residence ,s indicated telow.) 

Medical Research Council 
20 Park Crescent 
London 
W1N 4AL 
United Kingdom 



This person is also inventor. 



Telephone No. 



Facsimile No. 



Teleprinter No. 



State (i.e. country) of nationality: 



United Kingdom 



State (i.e. country) of residence: 



United Kingdom 



Box No. Ill FURTHER APPLICANT(S) AND/OR (FURTHER) INVENTOR(S) 



I the ap^^^^^^^^ 0-e. country) of residence if no State of residence .s indicated belo^.) 

CHOO, Yen 

MRC Laboratory of Molecuar Biology 
Medical Research Council 
Hills Road 
Cambridge 

CB2 2QH United Kingdom 



This person is: 
□ applicant only 

25 applicant and inventor 

— j inventor only (if this check-box is 
' — J marked, do not fill in below) 



State (i.e. country) of nationality: 



Greece 



j state (i.e. country) of residence: ^^.^^^ ^^^^^^ 



[This person is applicant for all designated ^ all designated States except the ^ of aS<^ onlf □ ISl sSpfemenTaTBox 
the purposes of: □ states '— ' United States of America of Amenca oniy t^"^ . . 



52 Further applicant and/or (further) Inventors are indicated on a continuation sheet 



BOX NO. IV AGEN T OR COMMON R EPRESENTATIVE; ORADDRESS ^ CORRESPONDENCE _ 

rg agent common representative 



The person identified below Is hereby/has been appointed to act on behalf of 
the applicant(s) before the competent International Authonties as: 



Name and address* (P^rnily name followed by given name; for a legal entity, full official designation. 
1 Name and address. (^^^ ^ r^^^^ ^^^^ .^^^^^^ ^^^^^ ^^^^ ^^^^ ^^^^f^^^ 

MASCHIO. Antonio 
D Young & Co 
21 New Fetter Lane 
London 
EC4A 1DA 
United Kingdom 



Telephone No. 



+44 1703 634816 



Facsimile No. 



+44 1703 224262 



Teleprinter No. ^^^gg^ yoUNGS G 



— ^ ,,ere no agent or common repT^I^i^ative is/has been appointed and the space above is used instead to indicate a 
^— ' special address to which correspondence should be sent ^ 



Form PCT/RO/101 (first sheet) (January 1997; reprint January 1998) 
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^tinuation of Box No. Ill FURTHER APPLICANTS AND/OR (FURTHER)jNVEI^^ 



If none of the following sub-boxes is used, this sheet is not to bejncl^^ request 



Name and address: (Family name followed by given name: for a legal entity, full offtctal designation. The 
address must include postal code and name of country. The countiy of the address indicated in this Box is 
the applicant's State (i.e. country) of residence if no State of residence is indicated below.) 

iSALAN. Mark 
24 Shottfield Avenue 
East Sheen 
London 
SW14 SEA 
United Kingdom 



This person is: 

applicant only 

3i applicant and inventor 

; — I inventor only (if this check-box is 
' — ' marked, do not fill in below) 



State (i.e. country) of nationality: 



This person is applicant for ^„ all designated ; , 

the purposes of: ' — ! states ' — ' United States of America 



United Kingdom 



I State (i.e. country) of residence: 



United Kingdom 



all designated States except the 



ryi the United States , — • the States indicated in 
of America only ' — ■ the Supplemental Box 



Name and address: (Family name followed by given name: for a legal entity, full official designation. The jhis person is: 
address must include postal code and name of country. The country of the address indicated in this Box is | 

the aoolicanfs State (i.e. country) of residence if no State of residence is indicated below.) , 

[ ; applicant only 

i applicant and inventor 



□ 



inventor only (if this check-box is 
marked, do not fill in below) 



State (i.e. country) of nationality: 



This person is applicant for 
the purposes of: 



Name and address: (Family name followed by given name: for a legal entity, full official designation. The 
address must include postal code and name of country. The country of the address indicated in this Box is 
the applicant's State (i.e. country) of residence if no State of residence is indicated below.) 



State (i.e. country) of residence: 



nail designated 
States 



all designated States except the 



United States of America 



n 



the United States 
of America only 



□ 



the States indicated in 
the Supplemental Box 



This person is: 
I 1 applicant only 

I i applicant and inventor 

j — 1 inventor only (if this check-box is 
I — ' marked, do not fill in below) 



State (i.e. country) of nationality: 



State (i.e. country) of residence: 



This person is applicant for gll designated f-l a" designated States except the pn the United States p the States incii(^ted in 
the purposes of: □ states ^ United States of America ^ of America only ^ the Supplemental Box 



Name and address: (Family name followed by given name: for a legal entity, full official designation. The 
address must include postal code and name of country. The country of the address indicated in this Box is 
the applicant's State (i.e. country) of residence if no State of residence is indicated below.) 



This person is: 
Q applicant only 

I I applicant and inventor 

□ inventor only (if this check-box is 
marked, do not fill in below) 



State (i.e. country) of nationality: 



State (i.e. country) of residence: 



This person is applicant for 3,1 designated r-. all designated States except the p the United States q the states ind^^^^^^ I 
the purposes of: U states U United States of America ^ of Amenca only the Supplemental Box 



"1 Further applicants and/or (further) inventors are indicated on a continuation sheet 



Form PCT/RO/101 (continuation sheet) (January 1997; reprint January 1998)- 



See Notes to the request form 
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Sheet No. 3 



Supplemental Box the Suppiemental Box is not used, this sheet n^ ed not be inctuded in the request 



Use this box in the foilowing cases: 

1. If, in any of the Boxes, the space is insufficient to 
furnish alt the information: 

in particular: 

(i) if more than t^z/o persons are involved as applicants 
and/or inventors and no "continuation sheet" is 
available: 



(ii) if. in Box No. II or in any of the sub-boxes of Box No. III. 
the indication "the States indicated in the Supplemental 
Box" is checked: 



(iii) if, in Box No. II or in any of the sub-boxes of Box No. Mi, 
the inventor or the inventor/applicant is not inventor for 
the purposes of all designated States or for the 
purposes of the United States of America: 

(iv) if. in addition to the agent(s) indicated in Box No. tV, 
there are further agents: 

(v) if. in Box No. V, the name of any State (or OAPl) is 
accompanied by the indication "patent of addition." or 
"certificate of addition." or if, in Box No. V, the name of 
the United States f America is accompanied by an 
indication "Continuation" or "Continuation-in-part": 

(vi) if there are more than three earlier applications whose 
priority is claimed: 



in such case, write "Continuation of Box No. ..."[indicate the number 
of the Box] and furnish the information in the same manner as 
required according to the captions of the Box in which the space was 
insufficient: 

in such case, write "Continuation of Box No. Ill" and indicate for each 
additional person the same type of information as required in Box No. III. 
The country of the address indicated in this Box is the applicant's State (i.e 
country) of residence if no State of residence is indicated below; 



in such case write "Continuation of Box No. Ii" or "Continuation of Box No. 
ill" or "Continuation of Boxes No. II and Hi" (as the case may be), indicate 
the name of the applicant(s) involved and. next to (each) such name. 
State(3) (and/or. where applicable. ARIPO. Eurasian. European or OAPl 
patent) for the purposes of which the named person is applicant; 

in such case write "Continuation of Box No. I!" or "ConUnuation of Box No. 
Ill" or "Continuation of Boxes No. II and 11!" (as the case may be), indicate , 
the name of the inventor(s) and. next to (each) such name. State(s) (and/or. i 
where applicable, ARtPO. Eurasian. European or OAPl patent) for the i 
purposes of which the named person is inventor; 

in such case, write "Continuation of Box No. IV and indicate for each further . 
agent the same type of information as required in Box No. IV; 



2. If the applicant ciaims, in respect of any designated 
Office, the benefits of provisions of the national law 
concerning non-prejudicia! disclosures or exceptions to 
lack of novelty: 

CONTINUATION OF BOX IV - AOOITICNAL REPRESENTATIVES 

PURVIS. William Michael Cameron 
COTTEiR, Ivan John 
PILCH. Adam Johyn Michael 
CRISP, David Norman 
ROBINSON, Nigel Alexander Julian 
HARRIS. Ian Richard 
TURNER. James Arthur 
HARDING. Charles Thomas 
MALLALIEU. Catherine Louise 
PRICE. Paul Anthony King 
PRATT. Richard Wilson 
HOLMES. Miles Keeton 
HORNER. David 
NACHSHEN. Neil 
POTTER. Julian Mark 



in such case, write "Continuation of Box No. V and the name of each State, 
involved (or OAPl). and after the name of each such State (or OAPl). the 
number of the parent title or parent application and the date of grant of the 
parent title or filing of the parent application; 



in such case, write "Continuation of Box No. VI" and indicate for each 
additional eariier application the same type of infomnation as required in 
Box No. VI. 

in such case, write "Statement Concerning Non-Prejudicial 
Disclosures or Exceptions to Lack of Novelty'' and fumtsh that 
statement below. 
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THIS PAGE BLANK (us^o) 



Box No. V DESIGNATION OF STATES 




Sheet No. 



The following designations are hereby made under Rule 4.9(a) (mark the applicable check-boxes: at least one must be marked): 



Regional Patent 
SZf AP 

EA 



ARIPO Patent: GH Ghana. GM Gambia. KE Kenya. LS Lesotho. MW Malawi, SD Sudan. SZ Swaziland. UG Uganda. ZW 
Zimbabwe, and any other State which is a Contracting State of the Harare Protocol and of the PCT 

Eurasian Patent: AM Armenia. AZ Azerbaijan. BY Belarus. KG Kyrgyzstan, KZ Kazakstan WlD Republic of Moldova. Ru 
Russian Federation. TJ Tajikstan. TM Turkmenistan, and any other State which is a Contracting State of the Eurasian Patent 
Convention and of the PCT 

C71 EP European Patent: AT Austria. BE Belgium. CH and LI Switzerland and Liechtenstein.CY Cyprus. DE Germany. DK 
Denmark ES Spain Fl Finland. FR France. GB United Kingdom. GR Greece. IE Ireland, IT Italy. LU Luxembourg. MC 
Monaco, NL Netherlands. PT Portugal. SE Sweden, and any other State which is a Contracting State of the European Patent 
Convention and of the PCT 

77' OA OAPI Patent: BF Burkina Faso. BJ Benin. OF Central African Republic. CG Congo. CI Cote d'lvoire. CM Cameroon. GA 
Gabon GN Guinea ML Mall. MR Mauritania. NE Niger. SN Senegal, TD Chad, TG Togo, and any other State which is a 
member State of OAPI and a Contracting State of the PCT (if any other kind of protection or treatment desired, please specify on dotted 
line) 



National Patent (if other kind of protection or treatment desired, specify on dotted line) 



sa AL 

AM 

'!lB AT 

BA 

sa BG 

aZi BR 

sa BY 
sa CA 



Albania 
Armenia 
Austria 
Australia 
Azerbaijan 



sa 

SZ 



LT 
LU 
LV 

23 MG 
MK 



21 CH AND LI 
2 CN China 

2 

2 cz 

2 DE 
2 DK 
2 EE 
2 ES 
2 F> 
2 GB 
2 GE 
2 GH 
2 GM 



Bosnia and Herzegovina 2 

Barbados 

Bulgaria ' . ' ^ Mongolia 

Brazil 2 Malawi 

Belarus 2 Mexico 

Canada 2 NO 
Switzerland and Liechtenstein 

sa PL 

Cuba;;;;;;;;;;";;;;";;;;;;; ;;;;;;;;;;;;;; sa pt 

Czech Republic Sfl 

Germany SZI 

Denmark SZI SD 

Estonia ,52 SE 

Spain sa SG 



Lithuania 

Luxembourg 

Latvia 

Republic of Moldova 
Madagascar 

The former Yugoslav Republic of Macedonia 



Finland ZZZZZ'ZZZ ^3 S! 

United Kingdom SB SK 

Georgia SB SL 

Ghana ."ZZZZZ ^ 

Gambia SB ™ 



GW Guinea-Bissau 

52 HU Hungary 

3] ID Indonesia 

sa IL Israel 

52 IS Iceland 

5a JP Japan 

5a KE Kenya " ; " 

5a KG Krygyzstan 

5a KP Democratic People's Republic of Korea 



sa TR 
sa TT 
sa UA 
sa UG 
sa us 



Norway 
New Zealand 
Poland 
Portugal 
Romania 

Russian Federation 
Sudan 
Sweden 
Singapore 
Slovenia 
Slovakia 
Sierra Leone 
Tajikstan 
Turkmenistan 
Turkey 

Trinidad and Tobago 

Ukraine 

Uganda 

United States of Amenca 



'sa KR 
sa KZ 
'sa Lc 
sa LK 
sa LR 
sa LS 



Republic of Korea 
Kazakstan 
Saint Lucia 
Sri Lanka 
Liberia 
Lesotho 



5a UZ Uzbekistan';; 

5a VN Viet Nam 

5a YU Yugoslavia ' ' 

5a ZW Zimbabwe ' * 

Check-boxes reserved for designating States (for the purposes of a 
national patent) which have become party to the PCT after the 
issuance of this sheet: 

5a HR Croatia X ZA Soum 

52 GO Grenada X AE United Arab Enn^i^^^^ 

5a »N India 



In addition to the designations made above, the applicant also makes under Rule 4.9(b) all designations which would be permitted under the 

PCT except the designation(s) of 

The applicant declares that those additional designations are subject to confirmation and that any designation which is not corifirmed before 
the expiration of 15 months from the priority date is to be regarded as withdrawn by the applicant at the ^xP'ration of mat^^^^^^^^^^^ 
(Connmation of a designation consists of the fiUng of a notice specifying that designation and the payment of the designation and conrirmation fees. Connrmaiion 
must reach the receiving Office within the 15^onth time limit.) 



Form PCT/RO/1 01 (second sheet) (January 1998) 



See Notes to the request form 
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Sheet No. 5 




The priority of the following earlier application(s) is hereby claimed: 



Country 

(in which, or for which, the application 
was filed) 



item (1) 



item (2) 



item (3) 



~Mark the following check-box if the certified copy of the earlier application is to be issued by the Office which for the purposes of the present international application 
is the receiving Office (a fee may be required): 

f-y The receiving Office is hereby requested to prepare and transmit to the International 

Bureau a certified copy of the earlier application(s) identified above as item(s) : (1), (2). (3) 



Box No. VI PRIORITY CLAIM 



Further priority claims are indicated in the Supplemental Box □ 



Filing Date 
(da y/month/year) 



Application No. 



Office of filing 
(only for regional or international 
application) 



United Kingdom 



17 Mar 1998 



9805576.7 



United Kingdom 



United Kingdom 



31 Mar 1998 



9806895.0 



3 Apr 1998 



I 



9807246.5 



Box No. VII INTERNATIONAL SEARCHING AUTHORITY 



ISA/ EPO 



Choice of International Searching Authority (ISA) (If two or more international Searching Authorities are 
competent to carry out the international search, indicate the Authority chosen: the two-letter code may be used). 

Earlier Search Fill in where a search (international, international-type or other) by the International Search Authority has already ^^^"^<=%''^^f^^^^ ZZlZf"^ 
an J //fe AuS is now requested to base the international search, to the extent possible, on the results of that earlier search. Identify such search or request 
either by reference to the relevant application (or the translation thereof) or by reference to the search request. 

Country (or regional Office): Date (day/month/year) Number: 



Box No. VII CHECKLIST 



This international application contains the 
following number of sheets: 



1. request 


5 


sheets 




2, description 


42 


sheets 


2. 


3. claims 


5 


sheets 










3. 


4. abstract 


1 


sheets 




5. drawings 


8 


sheets 


4. 


Total : 


61 


sheets 





This international application is accompanied by the item(s) marked below: 

5. n fee calculation sheet 



I — i separate signed power of 
' — ' attorney 

□ copy of general power of 
attorney 

j — j statement explaining lack of 
' — ' signature 

priority documents(s) 
□ identified in Box No. VI as 
ftem(s): 



I — I separate indications concerning 
6- LJ deposited microorganisms 

j — j nucleotide and/or amino acid 
1 — 1 sequence listing (diskette) 

8. □ other (spec/^;.- 



Figure No. 



of the drawings (if any) should accompany the abstract when it is published 



Box No. IX SIGNATURE OF APPLICANT OR AGENT 



Next to each signatucB, indicate the nam e of the person signing and the capacity in which the person signs (if such capacity is not obvious from reading the request) 




MA5CHI0, Antonio 



1 . Date of actual receipt of the purported 
international application: 



For receiving Office use only 



3. Corrected date of actual receipt due to later but 
timely received papers or drawings completing 
the purported international application: 



4. Date of timely receipt of the required 
corrections under POT Article 1 1(2): 



6. j— I Transmittal of search copy delayed 



2. Drawings: 
I i received: 
I I not received: 




Date of receipt of the record copy by 
the International Bureau: 



Form PCT/RO/101(!ast sheet) (January 1994; reprint January 1998) 



See Notes to the request form 
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PATE^JT COOPERATION 



TI^^TY 



From the : 
INTERNATIONAL PRELIMINARY EXAMINING AUTHORITY 



To: 



MAschCTwr 1 8- SEP 2000 

D YOUNG & CO 

21 New Fetter Lane 

London EC4A 1 DA 

GRANDE BRETAGNE *. 



Applicant's or agent's file reference 
P004287WO ATM LSB 



PCT 



NOTIFICATION OF TRANSMITTAL OF 
THE INTERNATIONAL PRELIMINARY 
EXAMINATION REPORT 

(PCT Rule 71.1) 



Date of mailing 
(day/month/year) 



1 1 .07.2000 



IMPGRTArfT NOTIRCATION 



International application No. 
PCT/GB99/00816 



International filing date (day/month/year) 
17/03/1999 



Priority date (day/month/year) 
17/03/1998 



Applicant 

GENDAQ LIMITED et al. 



1 The applicant is hereby notified that this International Preliminary Exanr^ining Authority transmits herewith the 
' international preliminary examination report and its annexes, if any, established on the international application. 

2. A copy of the report and its annexes, if any, is being transmitted to the International Bureau for communication 
to all the elected Offices. 

1 Where required by any of the elected Offices, the International Bureau will prepare an English translation of the 
report (but not of any annexes) and will transmit such translation to those Offices. 

4. REMINDER 

The applicant must enter the national phase before each elected Office by P^f ^^'^^^l"/^;.^^^^^^^^^ 
translations and paying national fees) within 30 months from the priority date (or later in some Offices) (Article 
39(1)) (see also the reminder sent by the International Bureau with Form PCT/IB/301). 

Where a translation of the international application must be fumished to an elected Office that "^'J^';/"""' 
contain a translation of any annexes to the international preliminary examination report. It is the applicant 
responsibility to prepare and furnish such translation directly to each elected Office concerned. 

For further details on the applicable time limits and requirements of the elected Offices, see Volume II of the 

PCT Applicant's Guide. 



Name and mailing address of the IPEA/ 

.„ European Patent Office 

JD-80298 Munich 
Tel. ■»-49 89 2399 - 0 Tx: 523556 epmu d 
" Fax: +49 89 2399 - 4465 



Authorized officer 
Christensen, J 

Tel. +49 89 2399-8052 




Form PCT/IPEA/416 (July 1992) 
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RECD 1 2 JUL 2000 



>ATENT COOPERATION TF 

PCT 

INTERNATIONAL PRELIMINARY EXAMINATION REPORT 

(PCT Article 36 and Rule 70) 



PCT 



Applicant's or agent's file reference 
P004287WO ATM LSB 


See Notification of Transmittal of International 
FOR FURTHER ACTION Preliminary Examination Report (Form PCT/IPEA/416) 


International application No. 
PCT/GB99/00816 


International filing date (day/month/year) 
17/03/1999 


Priority date (day/month/year) 
17/03/1998 



International Patent Classification (IPC) or national classification and IPC 
C12N15/12 



Applicant 

GENDAQ LIMITE D et al. 

1 . This international preliminary examination report has been prepared by this International Preliminary Examining Authority 
and is transmitted to the applicant according to Article 36. 



2. This REPORT consists of a total of 7 sheets, including this cover sheet. 



□ This report is also accompanied by ANNEXES, i.e. sheets of the description, claims and/or drawings which have 
been amended and are the basis for this report and/or sheets containing rectifications made before this Authority 
(see Rule 70.16 and Section 607 of the Administrative Instructions under the PCT). 



These annexes consist of a total of sheets. 



3. This report contains indications relating to the following items: 



II 


□ 


III 


□ 


IV 




V 




VI 


□ 


VII 


□ 


VIII 





Basis of the report 
Priority 

Non-establishment of opinion with regard to novelty, inventive step and industrial applicability 
Lack of unity of invention 

Reasoned statement under Article 35(2) with regard to novelty, inventive step or industrial applicability; 
citations and explanations suporting such statement 

Certain documents cited 

Certain defects in the international application 

Certain observations on the international application 



Date of submission of the demand 



06/10/1999 



Name and mailing address of the international 
preliminary examining authority: 
European Patent Office 

D-80298 Munich 
Tel. +49 89 2399 - 0 Tx: 523656 epmu d 

Fax: +49 89 2399 - 4465 



Date of completion of this report 



11.07.2000 



Authorized officer 
Vix, O 

Telephone No. +49 89 2399 7326 




Form PCT/IPEA/409 (cover sheet) (January 1994) 
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INTERNATIONAL PRELIMINARY 
EXAMINATION REPORT 



International application No. PCT/GB99/0081 6 



I. Basis of the report 

1 . This report has been drawn on the basis of {substitute sheets which have been furnished to the receiving Office in 
response to an invitation under Article 14 are referred to in this report as "originally filed" and are not annexed to 
the report since they do not contain amendments.): 

Description, pages: 

1 -42 as originally filed 

Claims, No.: 

1 -27 as originally filed 

Drawings, sheets: 

1 /8-8/8 as originally filed 

2. The amendments have resulted in the cancellation of: 

□ the description, pages: 

□ the claims, Nos.: 

□ the drawings, sheets: 

3. □ This report has been established as if (some of) the amendments had not been made, since they have been 

considered to go beyond the disclosure as filed (Rule 70.2(c)): 

4. Additional observations, if necessary: 
IV. Lack of unity of invention 

1 . In response to the invitation to restrict or pay additional fees the applicant has: 

□ restricted the claims. 

□ paid additional fees. 

□ paid additional fees under protest. 

□ neither restricted nor paid additional fees. 



Form PCT/iPEA/409 {Boxes l-VMI. Sheet 1 ) (January 1994} 




THIS PAGE BUNK (uspto) 




INTERNATIONAL PRELIMINARY 

EXAMINATION REPORT International application No. PCT/GB99/0081 6 



2. H This Authority found that the requirement of unity of invention is not complied and chose, according to Rule 

68.1 , not to invite the applicant to restrict or pay additional fees. 

3. This Authority considers that the requirement of unity of invention in accordance with Rules 13.1. 13.2 and 13.3 is 

□ complied with. 

EI not complied with for the following reasons: 
see separate sheet 

4. Consequently, the following parts of the intemational application were the subject of international preliminary 
examination in establishing this report: 

IS all parts. 

□ the parts relating to claims Nos. . 

V. Reasoned statement under Article 35(2) with regard to novelty, inventive step or industrial 
applicability; citations and explanations supporting such statement 

1. Statement 



Novelty (N) 


Yes: 


Claims 


2,4-27 




No: 


Claims 


1,3 


Inventive step (IS) 


Yes: 


Claims 


2,4-27 




No: 


Claims 


1,3 


Industrial applicability (lA) 


Yes: 


Claims 


1-27 




No: 


Claims 





2. Citations and explanations 
see separate sheet 

VIII. Certain observations on the international application 

The following observations on the clarity of the claims, description, and drawings or on the question whether the 
claims are fully supported by the description, are made: 

se s parate sh et 



Form PCT/IPEA/409 (Boxes I- VIII. Sheet 2) (January 1994) 
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INTERNATIONAL PRELIMINARY international application No. PCT/GB99/0081 6 
EXAMINATION REPORT - SEPARATE SHEET 



Re Item IV 

Lack of unity of invention 

The common inventive concept between all claims (except claim 3) is the polypeptide 
binding properties to a modified DMA triplet containing a 5-meC but not to the 
equivalent unmodified base. 

Claim 3 is independant and refers to a method for preparing a "Zinc finger" 
polypeptide binding to a modified DNA triplet containing a 5-meC. There is no 
mention that such a polypeptide does not bind the unmodified identical DNA 
sequence, therefore widening the scope of the original claim 1 which contains this 
limitation. Thus, claim 3 leads to a unity problem because of a lack of common 
inventive concept (as required by Rule 13.1 PCT). 

Correspondingly, present claims do not relate to one invention but to two separate 
ones, namely: 

(i) Invention 1 (claims 1-2 and 4-27) : a Zinc finger polypeptide desgined to bind to 
a modified DNA triplet containing a 5-meC but NOT to the unmodified identical DNA 
sequence. 

(ii) Invention 2 (claim 3) : a method to prepare a Zinc finger domain targeting a 
modified DNA triplet containing a 5-meCwherein however the binding tothe 
correspondng non-modified DNA triplet is not excluded. 

Re Item V 

Reasoned statement under Article 35(2) with regard to novelty, inventive step or 
Industrial applicability; citations and explanations supporting such statement 

1 . Reference is made to the following documents: 

D1 : WO 98 53059 A (MEDICAL RES COUNCIL ;ISALAN MARK (GB); CHOO YEN 

(GB); KLUG AARON () 26 November 1998 (1998-11-26) 
D2: WO 98 53058 A (MEDICAL RES COUNCIL ;ISALAN MARK (GB); CHOO YEN 

(GB); KLUG AARON () 26 November 1998 (1998-11-26) 
D3: WO 98 53060 A (MEDICAL RES COUNCIL ;ISALAN MARK (GB); CHOO YEN 

(GB); KLUG AARON () 26 November 1998 (1998-11-26) 



Form PCT/Separate Sheet/409 (Sheet 1 ) (EPO-April 1997) 
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INTERNATIONAL PRELIMINARY international application No. PCT/GB99/0081 6 

EXAMINATION REPORT - SEPARATE SHEET 



2. Novelty and Inventive step (Art. 33(2)(3) PCT) 

The invention 1 discloses a Zinc finger polypeptide binding to a target DNA sequence 
containing a 5-meC at the central position, but not to an identical sequence with an 
unmodified central C base. A method for modifying a Zinc finger polypeptide in order 
to achieve the above mentioned differentiated DNA binding property is also part of 
the invention. 

2.1. The claim 1 attempts to define the subject-matter in terms of a result to be achieved. 
The Zinc finger polypeptide technical features necessary for achieving the specific 
binding result have not been added. At this stage, any already known Zinc finger 
domains of the prior art could be tested to check whether or not it possesses 
inherently this specific binding capacity. Such a known polypeptide would take away 
the novelty of claim 1. 

2.2. Claim 3 is independant and refers to a DNA binding polypeptide binding to a modified 
DNA triplet containing a 5-meC. There is no mention that such a polypeptide does 
not bind the unmodified identical DNA sequence, therefore widening the scope of the 
original claim 1. As claim 3 is maintained without modifications, there is a unity 
problem in the present application because of a lack of common inventive concept. 
Furthermore, there is a problem of novelty and inventive step as document D1 (or 
also D2-D3) explicitly discloses a method that would allow the person of skill to 
design a Zinc finger domain targeting any desired DNA sequence (D1, page 2, line 
27-30). In the present application, the unexpected technical effect lies in the binding 
"differentiation" between modified and unmodified identical sequences which is 
absent of the subject matter of claim 3. 

2.3 Remaining claims 2, 4-27 are dealing with zinc finger polypeptide which bond to a 
target DNA sequence. Said claims contain technical features that render them novel 
and inventive over the prior art. 
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INTERNATIONAL PRELIMINARY International application No PCT/GB99/0081 6 
EXAMINATION REPORT - SEPARATE SHEET 



Re Item VIII 

Certain observations on the international application 

1 . Relating to Article 6 PCT: 

Claim 1 refers to a Zinc finger polypeptide which is supposed to bind to a target DNA 
sequence containing a modified base but not to an identical sequence with the 
unmodified base. Such a claim defines only a desideratum that clearly lacks the 
technical features that would allow the skilled person to realise said wish (typically a 
"result to be achieved"). There is no mention of the characteristics of such a Zinc 
finger polypeptide and the type of modification of the DNA sequence. The examples 
in the description are exclusively dealing with methylation of DNA sequences (as 
mentioried in claims 3-4, 7-8). The modification techniques focus on newly designed 
Zinc finger structural features that should be able to adapt methylation modification 
on "standard" bases (and not other type of chemical modification). A method for such 
a "custom" design is detailed in claim 3. and it gives clear indications how to modify 
a zinc finger domain to adapt a specific methylation modification in a DNA sequence. 
In opposite, claim 1 and the technical features of the claims dealing with very general 
"modified base" do not enable the person skilled in the art to perform the invention 
over the whole area of the claimed invention without undue burden (due to the 
necessity to screen a very large number of zinc finger polypeptides libraries in order 
to narrow down a structural element capable to perform the specific recognition). 

2. The numbering discussed in claims 3-4 and 6 is not clearly identified using the 
convention adopted in the description (page 7) or in claim 10. Similarly, the definition 
of the symbol "++" in claim 4 (mentioned in page 4 of the description) has not been 
incorporated in the claim. 

3. Claim 21 refers to a Zif 268 protein that is only described by references to other 
publications mentioned in brackets. 

As a general rule, claims shall not, except where absolutely necessary, rely, in 
respect of the technical features of the invention, on references to the description or 
drawings (Rule 6.2(b) PCT). In the present case an important technical feature is 
made by a reference to the description of scientific articles, which could render the 
scope of the claim broader or unclear. Zinc finger domains of the Zif 268 protein has 
no clear definition using a clear technical feature such as a protein sequence listing. 
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INTERNATIONAL PRELIMINARY international application No. PCT/GB99/0081 6 

EXAMINATION REPORT - SEPARATE SHEET 



4. Claim 22 refers to the second Zinc finger selected from the protein Zif 268. The same 
observations as in part 3 discussed above applies for claim 22. 

5. The wordings "randomisation" and "selection" in claim 26 are too vague and do not 
clearly delimit the scope of the claim. These steps need to be clearly defined in terms 
of strategy adopted for the randomisation process (a "partial" or "total" randomisation 
at every amino-acid position would conduct to very different protein domain) and type 
of selection that would allow "to improve the characteristics" of the DMA binding 
protein. Such "characteristics" (most likely the distinct binding capacity) of the protein 
are not stated explicitly. 
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WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




PCX 

INTERNATIpNAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) Internationa! Patent, Classification ^ : 
C12N 15/00, C07K 14/00 



A2 



(11) International Publication Number: WO 99/47656 

(43) InternaUonal Publication Date: 23 September 1999 (23.09.99) 



(21) International Application Number: PCT/GB99/008 1 6 

(22) International FUing Date: 17 March 1999 (17.03.99) 



(30) Priority Data: 

9805576.7 
9806895.0 
9807246.5 



17 March 1998 (17.03.98) GB 
31 March 1998 (31.03.98) GB 
3 April 1998 (03.04.98) GB 



(71) Applicant {for all designated States except US): MEDICAL 

RESEARCH COUNCIL [GB/GB]; 20 Park Crescent, Lon- 
don WIN 4AL (GB). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): CHOO, Yen [GR/GB]; 
MRC Laboratory of Molecular Biology. Medical Research 
Council, Hills Road, Cambridge CB2 2QH (GB). ISALAN, 
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Nucleic Acid Binding Proteins 

The present invention relates to DNA binding proteins. In particular, the invention 
relates to a method for designing a protein which is capable of binding to a defined 
5 methylated DNA sequence but not to an equivalent unmethylated DNA sequence. 

Protein-nucleic acid recognition is a commonplace phenomenon which is central to a 
large number of biomolecular control mechanisms which regulate the functioning of 
eukaryotic and prokaryotic cells. For instance, protein-DNA interactions form the basis 
10 of the regulation of gene expression and are thus one of the subjects most widely studied 
by molecular biologists. 

A wealth of biochemical and structural information explains the details of protein-DNA 
recognition in numerous instances, to the extent that general principles of recognition 
15 have emerged. Many DNA-binding proteins contain independently folded domains for 
the recognition of DNA, and these domains in turn belong to a large number of structural 
families, such as the leucine zipper, the "helix-tum-helix" and zinc finger families. 

Despite the great variety of structural domains, the specificity of the interactions observed 
20 to date between protein and DNA most often derives from the complementarity of the 
surfaces of a protein a-helix and the major groove of DNA [Klug, (1993) Gene 
135:83-92]. In light of the recurring physical interaction of a-helix and major groove, the 
tantalising possibility arises that the contacts between particular amino acids and DNA 
bases could be described by a simple set of rules; in effect a stereochemical recognition 
25 code which relates protein primary structure to binding-site sequence preference. 

It is clear, however, that no code will be found which can describe DNA recognition by 
all DNA-binding proteins. The structures of numerous complexes show significant 
differences in the way that the recognition a-helices of DNA-binding proteins from 
30 different structural families interact with the major groove of DNA, thus precluding 
similarities in patterns of recognition. The majority of known DNA-binding modfs are 
not particularly versatile, and any codes which might emerge would likely describe 
binding to a very few related DNA sequences. 
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Even within each family of DNA-binding proteins, moreover, it has hitherto appeared 
that the deciphering of a code would be elusive. Due to the complexity of the 
protein-DNA interaction, there does not appear to be a simple ^'alphabetic" equivalence 
5 between the primary structures of protein and nucleic acid which specifies a direct amino 
acid to base relationship. 

International patent application WO 96/06166 addresses this issue and provides a 
"syllabic" code which explains protein-DNA interactions for zinc finger nucleic acid 
10 binding proteins. A syllabic code is a code which relies on more than one feature of the 
binding protein to specify binding to a particular base, the features being combinable in 
the forms of "syllables", or complex instructions, to define each specific contact. 

Our copending UK patent applications, GB 9710805.4, 9710806.2, 9710807.0, 
15 9710808.8, 9710809.6, 9710810.4, 9710811.2 and 9710812.0 describe improved 
techniques for designing zinc finger polypeptides capable of binding desired nucleic acid 
sequences. In combination with selection procedures, such as phage display, set forth for 
example in WO 96/06166, these techniques enable the production of zinc finger 
polypeptides capable of recognising practically any desired sequence. 

20 

Zinc finger domains studied and produced to date are capable of binding to recognition 
sequences composed by any of four nucleic acid bases: A, C, G or T (U in RNA). 
However, the DNA of many organisms includes also a fifth base, 5-methylcytosine 
(5-meC or, in nucleotide sequences herein, M). 5-meC arises from specific methylation 

25 of cytosine, and is used to mark the genome or to increase its information content. 
5-methyIcytosine is well known to affect protein-DNA interactions, for instance 
inhibiting cleavage of DNA by certain restriction enzymes. In vertebrates, cytosine is 
frequently methylated when directly preceding guanine, as in the dinucleotide CpG. This 
type of methylation generally down-regulates vertebrate gene expression, and can also 

30 prevent the binding of many eukaryotic transcription factors to DNA. Yet the zinc finger 
transcription factors tested to date, Spl and YYl, are not affected by CpG methylation of 
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their DNA binding sites, suggesting that zinc fingers are incapable of discriminating 
between cytosine and 5-meC. 

Since methylated cytosine bases are involved with many regulatory interactions in gene 
5 expression, and particularly in eukaryotic, including human, gene expression, the 
production of zinc finger polypeptides which specifically target methylated cytosine 
bases would be highly desirable. Such polypeptides, in order to be useful, must be able to 
differentiate DNA sequences in which cytosine is methylated to 5-meC from identical 
non-methylated sequences. 

10 

Further nucleic acid base modifications are known in the art. For example, brominated 
nucleosides are known, such as Br-dU. Being photolabile, brominated nucleosides are 
useful in the determination of DNA-protein complex structure. Br-dU containing 
oligonucleotides are also useful as probes, since antibodies are available which recognise 

15 Br-dU. Moreover, in antisense oligonucleotide chemistry, the use of backbone 
modifications to improve oligonucleotide stability is well known; for example, 
phosphorothioate and T-O methylation are commonplace. Such backbone-modified 
nucleosides, and other nucleosides, may also be C-5 modified. For example, C-5 propyne 
derivatives and C-5 methylpyrimidine nucleosides are known and used in antisense 

20 nucleic acid chemistry. 

Specific detection of modified nucleotides, and preferential binding of DNA-binding 
proteins thereto, is desirable. However, agents which are capable of reliably targeting a 
protein to a modified nucleic acid in a sequence-specific manner are not available in the 
25 art. 

Summary of the Invention 

We have now determined that modified nucleosides can be specifically recognised, over 
30 unmodified equivalents, by zinc finger polypeptides in a sequence-dependent manner. 
The invention accordingly provides a method for producing a zinc finger polypeptide 
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which binds to a target nucleic acid sequence containing a modified nucleic acid base, but 
not to an identical sequence containing the equivalent unmodified base. 



In the present invention, a "modified" base is a nucleic acid base other than A, C, G or T 
5 as they occur in DNA in nature. Thus, the term modified includes methylated bases, 
such as 5-meC which occurs naturally in DNA, and base analogues, including naturally- 
occurring analogues such as U and artificial analogues such as I, backbone-modified 
bases and other artificial nucleosides. 

10 In a first embodiment, the invention provides a method for preparing a DNA binding 
polypeptide of the Cys2-His2 zinc finger class capable of binding to a DNA triplet in 
target DNA sequence comprising 5-meC as the central residue in the target DNA triplet, 
wherein binding to the 5-meC residue by an a-heiical zinc finger DNA binding motif of 
the polypeptide is achieved by placing an Ala residue at position +3 of the a-helix of the 

15 zinc finger. 

All of the DNA-binding residue positions of zinc fingers, as referred to herein, are 
numbered from the first residue in the a-helix of the finger, ranging from +1 to -h9. "-1" 
refers to the residue in the framework structure inmiediately preceding the a-helix in a 
20 Cys2-His2 zinc finger polypeptide. Residues referred to as are residues present in 
an adjacent (C-terminal) finger. Where there is no C-terminal adjacent finger, ''-h+" 
interactions do not operate. 

Cys2-His2 zinc finger binding proteins, as is well known in the art, bind to target nucleic 
25 acid sequences via a-helical zinc metal atom co-ordinated binding motifs known as zinc 
fingers. Each zinc finger in a zinc finger nucleic acid binding protein is responsible for 
determining binding to a nucleic acid triplet in a nucleic acid binding sequence. 
Preferably, there are 2 or more zinc fingers, for example 2, 3, 4, 5 or 6 zinc fingers, in 
each binding protein. Advantageously, there are 3 zinc fingers in each zinc finger 
30 binding protein. 
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The method of the present invention allows the production of what are essentially 
artificial DNA binding proteins. In these proteins, artificial analogues of amino acids 
may be used, to impart the proteins with desired properties or for other reasons. Thus, the 
term "amino acid", particularly in the context where "any amino acid" is referred to, 
5 means any sort of natural or artificial amino acid or amino acid analogue that may be 
employed in protein construction according to methods known in the art. Moreover, any 
specific amino acid referred to herein may be replaced by a functional analogue thereof, 
particularly an artificial functional analogue. The nomenclature used herein therefore 
specifically comprises within its scope functional analogues of the defined amino acids. 

10 

The a-helix of a zinc finger binding protein aligns antiparallel to the nucleic acid strand, 
such that the primary nucleic acid sequence is arranged 3' to 5' in order to correspond 
with the N terminal to C-terminal sequence of the zinc finger. Since nucleic acid 
sequences are conventionally written 5' to 3', and amino acid sequences N-terminus to 

15 C-terminus, the result is that when a nucleic acid sequence and a zinc finger protein are 
aligned according to convention, the primary interaction of the zinc finger is with the - 
strand of the nucleic acid, since it is this strand which is aligned 3' to 5\ These 
conventions are followed in the nomenclature used herein. It should be noted, however, 
that in nature certain fingers, such as finger 4 of the protein GLI, bind to the + strand of 

20 nucleic acid: see Suzuki et ah, (1994) NAR 22:3397-3405 and Pavletich and Pabo, (1993) 
Science 261:1701-1707. The incorporation of such fingers into DNA binding molecules 
according to the invention is envisaged. 

The invention provides a solution to a problem hitherto unaddressed in the art, by 
25 permitting the rational design of polypeptides which will bind DNA triplets containing a 
5-meC residue, but not identical triplets containing a C residue. 

The present invention may be integrated with the rules set forth for zinc finger 
polypeptide design in our copending UK patent applications listed above. In a preferred 
30 aspect, therefore, the invention provides a method for preparing a DNA binding 
polypeptide of the Cys2-His2 zinc finger class capable of binding to a DNA triplet in 
target DNA sequence comprising 5-meC, but not to an identical triplet comprising 
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unmethylated C, wherein binding to each base of the triplet by an a-helical zinc finger 
DNA binding motif in the polypeptide is determined as follows: 



a) if the 5' base in the triplet is G, then position +6 in the a-helix is Arg and/or position 
++2 is Asp; 

b) if the 5' base in the triplet is A, then position +6 in the a-helix is Gin or Glu and ++2 
is not Asp; 

c) if the 5' base in the triplet is T, then position +6 in the a-helix is Ser or Thr and 
position -I-+2 is Asp; or position +6 is a hydrophobic amino acid other than Ala; 

d) if the 5' base in the triplet is C, then position +6 in the a-helix may be any amino acid, 
provided that position ++2 in the a-helix is not Asp; 

e) if the central base in the triplet is G, then position +3 in the a-helix is His; 

f) if the central base in the triplet is A, then position 4-3 in the a-helix is Asn; 

g) if the central base in the triplet is T, then position +3 in the a-helix is Ala, Ser, He, 
Leu, Thr or Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small 
residue; 

h) if the central base in the triplet is 5-meC, then position +3 in the a-helix is Ala, Ser, 
He, Leu, Thr or Val; provided that if it is Ala, then one of the residues at -1 or +6 is a 
small residue; 

i) if the 3' base in the triplet is G, then position -1 in the a-helix is Arg; 

j) if the 3' base in the triplet is A, then position -1 in the a-helix is Gin and position +2 is 
Ala; 

k) if the 3' base in the triplet is T, then position -1 in the a-helix is Asn; or position -1 is 
Gin and position +2 is Ser; 

1) if the 3" base in the triplet is C, then position -1 in the a-helix is Asp and Position +1 
is Arg. 

The foregoing represents a set of rules which permits the design of a zinc finger binding 
protein specific for any given DNA sequence incorporating 5-meC. 

A zinc finger binding motif is a structure well known to those in the art and defined in, 
for example, Miller et al. (1985) EMBO J. 4:1609-1614; Berg (1988) PNAS (USA) 
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85:99-102; Lee et al, (1989) Science 245:635-637; see International patent applications 
WO 96/06166 and WO 96/32475, corresponding to USSN 08/422,107, incorporated 
herein by reference. 

5 In general, a preferred zinc finger framework has the structure: 

(A) X,., C X,., C X,.,, H X3., V, 

where X is any amino acid, and the numbers in subscript indicate the possible numbers of 
10 residues represented by X. 

In a preferred aspect of the present invention, zinc finger nucleic acid binding motifs may 
be represented as motifs having the following primary structure: 

15 (B) X" C Xj., C X2.3 FX'XXXXLXXHXXX^'H - linker 

-1 123456789 

wherein X (including X^ X^ and X*") is any amino acid. X2-4 and X2.3 refer to the 
presence of 2 or 4, or 2 or 3, amino acids, respectively. The Cys and His residues, which 
20 together co-ordinate the zinc metal atom, are marked in bold text and are usually 
invariant, as is the Leu residue at position +4 in the a-helix. 

Modifications to this representation may occur or be effected without necessarily 
abolishing zinc finger function, by insertion, mutation or deletion of amino acids. For 

25 example it is known that the second His residue may be replaced by Cys (Krizek et a/., 
(1991) J. Am. Chem. Soc. 113:4518-4523) and that Leu at +4 can in some 
circumstances be replaced with Arg. The Phe residue before Xc may be replaced by any 
aromatic other than Trp. Moreover, experiments have shown that departure from the 
preferred structure and residue assignments for the zinc finger are tolerated and may even 

30 prove beneficial in binding to certain nucleic acid sequences. Even taking this into 
account, however, the general structure involving an a-helix co-ordinated by a zinc atom 
which contacts four Cys or His residues, does not alter. As used herein, structures (A) 



A 
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and (B) above are taken as an exemplary structure representing all zinc finger structures 
of the Cys2-His2 type. 



Preferably, is %-X or P-^/y-X. In this context, X is any amino acid. Preferably, in 
5 this context X is E, K, T or S. Less preferred but also envisaged are Q, V, A and P. The 
remaining amino acids remain possible. 

Preferably, X2-4 consists of two amino acids rather than four. The first of these amino 
acids may be any amino acid, but S, E, K, T, P and R are preferred. Advantageously, it is 
10 P or R. The second of these amino acids is preferably E, although any amino acid may be 
used. 

Preferably, X^ is T or I. 

15 Preferably, X" is S or T. 

Preferably, X2.3 is G-K-A, G-K-C, G-K-S or G-K-G. However, departures from the 
preferred residues are possible, for example in the form of M-R-N or M-R. 

20 Preferably, the linker is T-G-E-K or T-G-E-K-P. 

As set out above, the major binding interactions occur with amino acids -1, +3 and +6. 
Amino acids +4 and +7 are largely invariant. The remaining amino acids may be 
essentially any amino acids. Preferably, position +9 is occupied by Arg or Lys. 
25 Advantageously, positions +1, +5 and +8 are not hydrophobic amino acids, that is to say 
are not Phe, Trp or Tyr. Preferably, position ++2 is any amino acid, and preferably 
serine, save where its nature is dictated by its role as a -f+2 amino acid for an N-terminal 
zinc finger in the same nucleic acid binding molecule. 

30 In a most preferred aspect, therefore, bringing together the above, the invention allows 
the definition of every residue in a zinc finger DNA binding motif which will bind 
specifically to a given DNA triplet incorporating a 5-meC residue as the central residue in 
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the triplet. Where targeting of a 5-meC containing sequence is desired, therefore, a 
suitable zinc finger can be constructed selecting a binding site such that 5-meC occurs at 
the centre of at least one base triplet thereof. 

5 The code provided by the present invention is not entirely rigid; cenain choices are 
provided. For example, positions +1, +5 and +8 may have any amino acid allocation, 
whilst other positions may have certain options: for example, the present rules provide 
that, for binding to a central T residue, any one of Ala, Ser or Val may be used at +3. In 
its broadest sense, therefore, the present invention provides a very large number of 

10 proteins which are capable of binding to every defined target DNA triplet incorporating 
5-meC as the central residue and thereby any DNA binding site incorporating 5-meC. 

Preferably, however, the number of possibilities may be significantly reduced. For 
example, the non-critical residues +1, +5 and +8 may be occupied by the residues Lys, 
15 Thr and Gin respectively as a default option. In the case of the other choices, for 
example, the first-given option may be employed as a default. Thus, the code according 
to the present invention allows the design of a single, defined polypeptide (a "default" 
polypeptide) which will bind to its target triplet. 

20 In a further aspect of the present invention, there is provided a method for preparing a 
DNA binding protein of the Cys2-His2 zinc finger class capable of binding to a target 
DNA sequence incorporating 5-meC, comprising the steps of: 

a) selecting a model zinc finger domain from the group consisting of naturally occurring 
25 zinc fingers and consensus zinc fingers; and 

b) mutating at least one of positions -1, -f3, +6 (and ++2) of the finger as required by a 
method according to the present invention. 

30 In general, naturally occurring zinc fingers may be selected from those fingers for which 
the DNA binding specificity is known. For example, these may be the fingers for which a 
crystal structure has been resolved: namely Zif 268 (Elrod-Erickson et al, (1996) 
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Structure 4:1171-1180), GLI (Pavletich and Pabo, (1993) Science 261:1701-1707), 
Tramtrack (Fairall et al, (1993) Nature 366:483-487) and YYl (Houbaviy et ai, (1996) 
PNAS (USA) 93:13577-13582). 

5 The naturally occurring zinc finger 2 in Zif 268 makes an excellent starting point from 
which to engineer a zinc finger and is preferred. 

Consensus zinc finger structures may be prepared by comparing the sequences of known 
zinc fingers, irrespective of whether their binding domain is known. Preferably, the 
10 consensus structure is selected from the group consisting of the consensus structure P Y 
KCPECGKSFSQKSDLVKHQRTHTG, andthe consensus structure P Y K 
CSECGKAFSQKSNLTRHQRIHTGEKP. 

The consensuses are derived from the consensus provided by Krizek et al, (1991) J. Am. 
15 Chem. Soc. 113:4518-4523 and from Jacobs, (1993) PhD thesis, University of 
Cambridge, UK. In both cases, the linker sequences described above for joining two zinc 
finger motifs together, namely TGEK or TGEKP can be formed on the ends of the 
consensus. Thus, a P may be removed where necessary, or, in the case of the consensus 
terminating T G, E K (P) can be added. 

20 

When the nucleic acid specificity of the model finger selected is known, the mutation of 
the finger in order to modify its specificity to bind to the target DNA may be directed to 
residues known to affect binding to bases at which the natural and desired targets differ. 
Otherwise, mutation of the model fingers should be concentrated upon residues -1, +3, +6 
25 and ++2 as provided for in the foregoing rules. 

In order to produce a binding protein having improved binding, moreover, the rules 
provided by the present invention may be supplemented by physical or virtual modelling 
of the protein/DNA interface in order to assist in residue selection, 

30 
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In a second embodiment, the invention provides a method for producing a zinc finger 
polypeptide capable of binding to a DNA sequence comprising a modified residue, but 
not to an identical sequence comprising an equivalent unmodified residue, comprising: 

5 a) providing a nucleic acid library encoding a repertoire of zinc finger polypeptides, 
the nucleic acid members of the library being at least partially randomised at one or more 
of the positions encoding residues -1, 2, 3 and 6 of the a-helix of the zinc finger 
polypeptides; 

10 b) displaying the library in a selection system and screening it against a target DNA 
sequence comprising the modified residue; 

c) isolating the nucleic acid members of the library encoding zinc finger 
polypeptides capable of binding to the target sequence; and 

15 

d) optionally, verifying that the zinc finger polypeptides do not bind significantly to 
a DNA sequence identical to the target DNA sequence but containing the unmodified 
residue in place of the modified residue. 

20 Methods for the production of libraries encoding randomised polypeptides are known in 
the art and may be applied in the present invention. Randomisation may be total, or 
partial; in the case of partial randomisation, the selected codons preferably encode 
options for amino acids as set forth in the rules of the first embodiment of the present 
invention. Thus, the first and second embodiments may advantageously be combined. 

25 

Preferably, the modified residue is 5-meC and the unmodified residue is C. Hov/ever, 
other modifications may be targeted by the method of the invention. For example, zinc 
finger polypeptides may be designed which specifically bind to nucleic acids 
incorporating the base U, in preference to the equivalent base T. An advantage of the 
30 second embodiment of the invention is that zinc finger polypeptides may be developed to 
bind to any DNA sequence incorporating a modified base, irrespective of its positioning 
in the target DNA triplet. 
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In a further preferred aspect, the invention comprises a method for producing a zinc 
finger polypeptide capable of binding to a DNA sequence comprising a modified residue, 
but not to an identical sequence comprising an equivalent unmodified residue, 
5 comprising: 

a) providing a nucleic acid library encoding a repertoire of zinc finger polypeptides 
each possessing more than one zinc fingers, the nucleic acid members of the library being 
at least partially randomised at one or more of the positions encoding residues -1, 2, 3 and 

10 6 of the a-helix in a first zinc finger and at one or more of the positions encoding residues 
-1, 2, 3 and 6 of the a-helix in a further zinc finger of the zinc finger polypeptides; 

b) displaying the library in a selection system and screening it against a target DNA 
sequence comprising the modified residue; 

15 

c) isolating the nucleic acid members of the library encoding zinc finger 
polypeptides capable of binding to the target sequence; and 

d) optionally, verifying that the zinc finger polypeptides do not bind significantly to 
20 a DNA sequence identical to the target DNA sequence but containing the unmodified 

residue in place of the modified residue. 

In this aspect, the invention encompasses library technology described in our copending 
International patent application WO98/53057, incorporated herein by reference in its 
25 entirety. WO98/53057 describes the production of zinc finger polypeptide libraries in 
which each individual zinc finger polypeptide comprises more than one, for example two 
or three, zinc fingers; and wherein within each polypeptide partial randomisation occurs 
in at least two zinc fingers. 

30 This allows for the selection of the "overlap" specificity, wherein, within each triplet, the 
choice of residue for binding to the third nucleotide (read 3' to 5' on the + strand) is 
influenced by the residue present at position +2 on the subsequent zinc finger, which 
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displays cross-strand specificity in binding. The selection of zinc finger polypeptides 
incorporating cross-strand specificity of adjacent zinc fingers enables the selection of 
nucleic acid binding proteins with a higher degree of specificity than is otherwise 
possible. 

5 

Advantageously, in order to derive the greatest benefit, the binding site is selected such 
that the modified base is in position 3 of one of the triplets, such that cross-strand 
specificity can be relied upon to contact the parallel strand in the corresponding position 
and introduce a further level of discrimination. 

10 

In a third embodiment, the present invention may be applied to the production of zinc 
finger polypeptides capable of binding to a DNA sequence comprising an unmethylated C 
residue, but not to an identical sequence comprising a 5-meC residue. This may be 
carried out by differential screening, as set forth above. Moreover, rules may be applied 
15 in addition to or instead of screening. 

Where the central residue of a target triplet is C, the use of Asp at position +3 of a zinc 
finger polypeptide allows preferential binding to C over 5-meC. 

20 

Brief Description of the Figures 

Figure la is an alignment of the amino acid sequence of the three fingers from Zif268 
used in a phage display library. Randomised residue positions in the a-helix of finger 2 
25 are marked 'X' and are numbered above the alignment relative to the first helical residue 
(position -hi). Residues which form the hydrophobic core are circled; zinc ligands are 
written as white letters on a black circle background; and positions comprising the 
secondary structure elements of a zinc finger are marked below the sequence. 

30 Figure lb shows amino acid sequences of the variant a-helical regions from some zinc 
. fingers selected by phage display using the DNA binding site GCG GNG GCG where the 
central (bold) nucleotide of the middle (underlined) triplet was either: (i) 5- 
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methylcytosine, (ii) thymine, or (iii) cytosine. Amino acid sequences are listed below the 
DNA oligonucleotide used in their selection, Amino acid positions are numbered above 
the aligned sequences relative to the first helical residue (position +1). Circled residues 
(in position +3) are predicted to contact the middle nucleotide of the binding site. 

5 

Figure Ic shows a phage ELISA binding assay showing discrimination of pyrimidines by 
representative phage-selected zinc fingers. The matrix shows three different zinc finger 
phage clones (x, y and z) reacted with four different DNA binding sites present at a 
concentration of 3nM. Binding is represented by vertical bars which indicate the OD 
10 obtained by ELISA (Choo and Klug, (1997) Curr. Opin. Str. Biol. 7:117-125). The 
amino acid sequences of the variant a-helical regions from the selected zinc fingers are: 
REDVLIRHGK (x), RADALMVHKR (y), and RGPDLARHGR (z). The DNA 
sequences contain the generic binding site GCGGNGGCG, where the central (bold) 
nucleotide was either: uracil (U), thymine (T), cytosine (C), or 5-methylcytosine (M). 

15 

Figure 2 shows the effect of cytosine methylation on DNA binding by phage-selected 
zinc fingers. Graphs show three different zinc finger phage binding to the DNA sequence 
GCGGCGGCG in the presence (circle) and absence (triangle) of methylation of the 
central base (bold). The zinc finger clones tested contained variant a-helical regions of 
20 the middle finger as follows: (a) RADALMVHKR, (b) RGPDLARHGR and (c) 
REDVLIRHGK. These respective zinc finger clones preferentially bind their cognate 
DNA site in the presence, absence, or regardless of cytosine methylation. 

Figure 3 shows the binding site interactions of 5 zinc finger polypeptides, selected taking 
25 into account cross-strand specificity by overlapping finger randomisation, with each of 
the oligonucleotides used in the selection process. Cross-strand contacts are shown. 

Figure 4 is analogous to Figure 2 and shows the binding curves for four of the 
polypeptides as described in Figure 3 to their respective oligonucleotides. 

30 

Figure 5 shows discrimination between 5-meC and T by zfHAE(M). 
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Figure 6 shows binding of zinc finger polypeptides zfHHA(M) and zfHAE(M) to a 
nucleotide sequence (Figure 6a) in response to selective methylation by addition of 
methylase enzymes (Figure 6b). Polypeptides zfHHA(Y) and zfHAE(Y) do not 
discriminate between methylated and unmethylated DNA, as expected. 

5 

Detailed Description of the Invention 

Randomisation involves may involve of zinc finger polypeptides at the DNA or protein 
10 level. Mutagenesis and screening of zinc finger polypeptides may be achieved by any 
suitable means. Preferably, the mutagenesis is performed at the nucleic acid level, for 
example by synthesising novel genes encoding mutant proteins and expressing these to 
obtain a variety of different proteins. Alternatively, existing genes can be themselves 
mutated, such by site-directed or random mutagenesis, in order to obtain the desired 
15 mutant genes. 

Mutations may be performed by any method known to those of skill in the art. Preferred, 
however, is site-directed mutagenesis of a nucleic acid sequence encoding the protein of 
interest. A number of methods for site-directed mutagenesis are known in the art, from 
20 methods employing single-stranded phage such as Ml 3 to PCR-based techniques (see 
"PCR Protocols: A guide to methods and applications", M.A. Innis, D.H. Gelfand, J.J. 
Sninsky, T.J. White (eds.). Academic Press, New York, 1990). Preferably, the 
commercially available Altered Site II Mutagenesis System (Promega) may be employed, 
according to the directions given by the manufacturer. 

25 

Randomisation of the zinc finger binding motifs produced according to the invention is 
preferably directed to those residues where the code provided herein gives a choice of 
residues. For example, therefore, positions +1, +5 and +8 are advantageously 
randomised, whilst preferably avoiding hydrophobic amino acids; positions involved in 
30 binding to the nucleic acid, notably -1, +2, +3 and +6, may be randomised also, 
preferably within the choices provided by the rules of the present invention. 
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Screening of the proteins produced by mutant genes is preferably performed by 
expressing the genes and assaying the binding ability of the protein product. A simple 
and advantageously rapid method by which this may be accomplished is by phage 
display, in which the mutant polypeptides are expressed as fusion proteins with the coat 
5 proteins of filamentous bacteriophage, such as the minor coat protein pll of bacteriophage 
ml 3 or gene III of bacteriophage Fd, and displayed on the capsid of bacteriophage 
transformed with the mutant genes. The target nucleic acid sequence is used as a probe to 
bind directly to the protein on the phage surface and select the phage possessing 
advantageous mutants, by affinity purification. The phage are then amplified by passage 

10 through a bacterial host, and subjected to further rounds of selection and amplification in 
order to enrich the mutant pool for the desired phage and eventually isolate the preferred 
clone(s). Detailed methodology for phage display is known in the art and set forth, for 
example, in US Patent 5,223,409; Choo and Klug, (1995) Current Opinions in 
Biotechnology 6:431-436; Smith, (1985) Science 228:1315-1317; and McCafferty et a/., 

15 (1990) Nature 348:552-554; all incorporated herein by reference. Vector systems and kits 
for phage display are available commercially, for example from Pharmacia. 

Specific peptide ligands such as zinc finger polypeptides may moreover be selected for 
binding to targets by affinity selection using large libraries of peptides linked to the C 
20 terminus of the lac repressor Lacl (Cull et al, (1992) Proc Natl Acad Sci USA, 89, 1865- 
9). When expressed in E. coli the repressor protein physically links the ligand to the 
encoding plasmid by binding to a lac operator sequence on the plasmid. 

An entirely in vitro polysome display system has also been reported (Mattheakis et aL, 
25 (1994) Proc Natl Acad Sci U S A, 91, 9022-6) in which nascent peptides are physically 
attached via the ribosome to the RNA which encodes them. 

The library of the invention may randomised at those positions for which choices are 
given in the rules of the first embodiment of the present invention. In particular, the 
30 members of the library are randomised at position +3 for binding to a central 5-meC 
residue. In such a case, 5-meC binding polypeptides will be selected by comparative 
binding analyses against methylated and non-methylated binding sites. However, the 
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rules set forth above allow the person of ordinary skill in the art to make informed 
choices concerning the desired codon usage at the given positions. For instance, position 
+3 in the case of a central 5-meC residue should be Ala residue, encoded by the codon 
GCN. 

5 

Zinc finger binding motifs designed according to the invention may be combined into 
nucleic acid binding proteins having a multiplicity of zinc fingers. Preferably, the 
proteins have at least two zinc fingers. In nature, zinc finger binding proteins commonly 
have at least three zinc fingers, although two-zinc finger proteins such as Tramtrack are 

10 known. The presence of at least three zinc fingers is preferred. Binding proteins may be 
constructed by joining the required fingers end to end, N-terminus to C-terminus. 
Preferably, this is effected by joining together the relevant nucleic acid coding sequences 
encoding the zinc fingers to produce a composite coding sequence encoding the entire 
binding protein. The invention therefore provides a method for producing a DNA 

15 binding protein as defined above, wherein the DNA binding protein is constructed by 
recombinant DNA technology, the method comprising the steps of: 

a) preparing a nucleic acid coding sequence encoding two or more zinc finger binding 

motifs as defined above, placed N-terminus to C-terminus; 
20 b) inserting the nucleic acid sequence into a suitable expression vector; and 

c) expressing the nucleic acid sequence in a host organism in order to obtain the DNA 

binding protein. 



A "leader" peptide may be added to the N-terminal finger. Preferably, the leader peptide 
25 is MAEEKP. 



The nucleic acid encoding the DNA binding protein according to the invention can be 
incorporated into vectors for further manipulation. As used herein, vector (or plasmid) 
refers to discrete elements that are used to introduce heterologous nucleic acid into cells 
30 for either expression or replication thereof. Selection and use of such vehicles are well 
within the skill of the person of ordinary skill in the art. Many vectors are available, and 
selection of appropriate vector will depend on the intended use of the vector, i.e. whether 
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it is to be used for DNA amplification or for nucleic acid expression, the size of the DNA 
to be inserted into the vector, and the host cell to be transformed with the vector Each 
vector contains various components depending on its function (amplification of DNA or 
expression of DNA) and the host cell for which it is compatible. The vector components 
5 generally include, but are not limited to, one or more of the following: an origin of 
replication, one or more marker genes, an enhancer element, a promoter, a transcription 
termination sequence and a signal sequence. 

Both expression and cloning vectors generally contain nucleic acid sequence that enable 
10 the vector to replicate in one or more selected host cells. Typically in cloning vectors, 
this sequence is one that enables the vector to replicate independently of the host 
chromosomal DNA, and includes origins of replication or autonomously replicating 
sequences. Such sequences are well known for a variety of bacteria, yeast and viruses. 
The origin of replication from the plasmid pBR322 is suitable for most Gram-negative 
15 bacteria, the 2|li plasmid origin is suitable for yeast, and various viral origins (e.g. SV 40, 
polyoma, adenovirus) are useful for cloning vectors in mammalian cells. Generally, the 
origin of replication component is not needed for mammalian expression vectors unless 
these are used in mammalian cells competent for high level DNA replication, such as 
COS cells. 

20 

Most expression vectors are shuttle vectors, i.e. they are capable of replication in at least 
one class of organisms but can be transfected into another class of organisms for 
expression. For example, a vector is cloned in E, coli and then the same vector is 
transfected into yeast or mammalian cells even though it is not capable of replicating 

25 independently of the host cell chromosome. DNA may also be replicated by insenion 
into the host genome. However, the recovery of genomic DNA encoding the DNA 
binding protein is more complex than that of exogenously replicated vector because 
restriction enzyme digestion is required to excise DNA binding protein DNA. DNA can 
be amplified by PCR and be directly transfected into the host cells without any replication 

30 component. 
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Advantageously, an expression and cloning vector may contain a selection gene also 
referred to as selectable marker. This gene encodes a protein necessary for the survival or 
growth of transformed host cells grown in a selective culture medium. Host cells not 
transformed with the vector containing the selection gene will not survive in the culture 
5 medium. Typical selection genes encode proteins that confer resistance to antibiotics and 
other toxins, e.g. ampicillin, neomycin, methotrexate or tetracycline, complement 
auxotrophic deficiencies, or supply critical nutrients not available from complex media. 

As to a selective gene marker appropriate for yeast, any marker gene can be used which 
10 facilitates the selection for transformants due to the phenotypic expression of the marker 
gene. Suitable markers for yeast are, for example, those conferring resistance to 
antibiotics G418, hygromycin or bleomycin, or provide for prototrophy in an auxotrophic 
yeast mutant, for example the URA3, LEU2, LYS2, TRPl, or HISS gene, 

15 Since the replication of vectors is conveniently done in colij an E, coli genetic marker 
and an E, coli origin of replication are advantageously included. These can be obtained 
from E. coli plasmids, such as pBR322, Bluescript© vector or a pUC plasmid, e.g. 
pUC18 or pUC19, which contain both £. coli replication origin and E. coli genetic 
marker conferring resistance to antibiotics, such as ampicillin. 

20 

Suitable selectable markers for mammalian cells are those that enable the identification of 
cells competent to take up DNA binding protein nucleic acid, such as dihydrofolate 
reductase (DHFR, methotrexate resistance), thymidine kinase, or genes conferring 
resistance to G418 or hygromycin. The manrunalian cell transformants are placed under 

25 selection pressure which only those transformants which have taken up and are 
expressing the marker are uniquely adapted to survive. In the case of a DHFR or 
glutamine synthase (GS) marker, selection pressure can be imposed by culturing the 
transformants under conditions in which the pressure is progressively increased, thereby 
leading to amplification (at its chromosomal integration site) of both the selection gene 

30 and the linked DNA that encodes the DNA binding protein. Amplification is the process 
by which genes in greater demand for the production of a protein critical for growth, 
together with closely associated genes which may encode a desired protein, are reiterated 
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in tandem within the chromosomes of recombinant cells, 
protein are usually synthesised from thus amplified DNA. 
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Increased quantities of desired 



Expression and cloning vectors usually contain a promoter that is recognised by the host 
5 organism and is operably linked to DNA binding protein encoding nucleic acid. Such a 
promoter may be inducible or constitutive. The promoters are operably linked to DNA 
encoding the DNA binding protein by removing the promoter from the source DNA by 
restriction enzyme digestion and inserting the isolated promoter sequence into the vector. 
Both the native DNA binding protein promoter sequence and many heterologous 
10 promoters may be used to direct amplification and/or expression of DNA binding protein 
encoding DNA. 

Promoters suitable for use with prokaryotic hosts include, for example, the P-lactamase 
and lactose promoter systems, alkaline phosphatase, the tryptophan (trp) promoter system 
15 and hybrid promoters such as the tac promoter. Their nucleotide sequences have been 
published, thereby enabling the skilled worker operably to ligate them to DNA encoding 
DNA binding protein, using linkers or adapters to supply any required restriction sites. 
Promoters for use in bacterial systems will also generally contain a Shine-Delgamo 
sequence operably linked to the DNA encoding the DNA binding protein. 

20 

Preferred expression vectors are bacterial expression vectors which comprise a promoter 
of a bacteriophage such as phagex or T7 which is capable of functioning in the bacteria. 
In one of the most widely used expression systems, the nucleic acid encoding the fusion 
protein may be transcribed from the vector by T7 RNA polymerase (Studier et al, 

25 Methods in Enzymol. 185; 60-89, 1990). In the £. coli BL21(DE3) host strain, used in 
conjunction with pET vectors, the T7 RNA polymerase is produced from the X-lysogen 
DE3 in the host bacterium, and its expression is under the control of the IPTG inducible 
lac UV5 promoter. This system has been employed successfully for over-production of 
many proteins. Alternatively the polymerase gene may be introduced on a lambda phage 

30 by infection with an int- phage such as the CE6 phage which is commercially available 
(Novagen, Madison, USA), other vectors include vectors containing the lambda PL 
promoter such as PLEX (Invitrogen, NL) , vectors containing the trc promoters such as 
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pTrcHisXpressTm (Invitrogen) or pTrc99 (Pharmacia Biotech. SE) or vectors containing 
the tac promoter such as pKK223-3 (Pharmacia Biotech) or PMAL (New England 
Biolabs, MA, USA). 

5 Moreover, the DNA binding protein gene according to the invention preferably includes a 
secretion sequence in order to facilitate secretion of the polypeptide from bacterial hosts, 
such that it will be produced as a soluble native peptide rather than in an inclusion body. 
The peptide may be recovered from the bacterial periplasmic space, or the culture 
medium, as appropriate. 

10 

Suitable promoting sequences for use with yeast hosts may be regulated or constitutive 
and are preferably derived from a highly expressed yeast gene, especially a 
Saccharomyces cerevisiae gene. Thus, the promoter of the TRPl gene, the ADHI or 
ADHII gene, the acid phosphatase (PH05) gene, a promoter of the yeast mating 

15 pheromone genes coding for the a- or a-f actor or a promoter derived from a gene 
encoding a glycolytic enzyme such as the promoter of the enolase, 
glyceraldehyde-3-phosphate dehydrogenase (GAP), 3-phospho glycerate kinase (PGK), 
hexokinase, pyruvate decarboxylase, phosphofructokinase, glucose-6-phosphate 
isomerase, 3-phosphoglycerate mutase, pyruvate kinase, triose phosphate isomerase, 

20 phosphoglucose isomerase or glucokinase genes, or a promoter from the TATA binding 
protein (TBP) gene can be used. Furthermore, it is possible to use hybrid promoters 
comprising upstream activation sequences (UAS) of one yeast gene and downsuream 
promoter elements including a functional TATA box of another yeast gene, for example a 
hybrid promoter including the UAS(s) of the yeast PH05 gene and downstream promoter 

25 elements including a functional TATA box of the yeast GAP gene (PH05-GAP hybrid 
promoter). A suitable constitutive PH05 promoter is e.g. a shortened acid phosphatase 
PH05 promoter devoid of the upstream regulatory elements (UAS) such as the PH05 
(-173) promoter element starting at nucleotide -173 and ending at nucleotide -9 of the 
PH05 gene. 

30 

DNA binding protein gene transcription from vectors in manunalian hosts may be 
controlled by promoters derived from the genomes of viruses such as polyoma virus. 
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adenovirus, fowlpox virus, bovine papilloma virus, avian sarconna virus, cytomegalovirus 
(CMV), a retrovirus and Simian Virus 40 (SV40), from heterologous mammalian 
promoters such as the actin promoter or a very strong promoter, e.g. a ribosomal protein 
promoter, and from the promoter normally associated with DNA binding protein 
5 sequence, provided such promoters are compatible with the host cell systems. 

Transcription of a DNA encoding DNA binding protein by higher eukaryotes may be 
increased by inserting an enhancer sequence into the vector. Enhancers are relatively 
orientation and position independent. Many enhancer sequences are known from 
10 manunalian genes (e.g. elastase and globin). However, typically one will employ an 
enhancer from a eukaryotic cell virus. Examples include the SV40 enhancer on the late 
side of the replication origin (bp 100-270) and the CMV early promoter enhancer. The 
enhancer may be spliced into the vector at a position 5' or 3* to DNA binding protein 
DNA, but is preferably located at a site 5' from the promoter. 

15 

Advantageously, a eukaryotic expression vector encoding a DNA binding protein 
according to the invention may comprise a locus control region (LCR). LCRs are capable 
of directing high-level integration site independent expression of transgenes integrated 
into host cell chromatin, which is of importance especially where the DNA binding 
20 protein gene is to be expressed in the context of a permanently-transfected eukaryotic cell 
line in which chromosomal integration of the vector has occurred, or in transgenic 
animals. 

Eukaryotic vectors may also contain sequences necessary for the termination of 
25 transcription and for stabilising the mRNA. Such sequences are commonly available 
from the 5' and 3' untranslated regions of eukaryotic or viral DNAs or cDNAs. These 
regions contain nucleotide segments transcribed as polyadenylated fragments in the 
untranslated portion of the mRNA encoding DNA binding protein. 

30 An expression vector includes any vector capable of expressing DNA binding protein 
nucleic acids that are operatively linked with regulatory sequences, such as promoter 
regions, that are capable of expression of such DNAs, Thus, an expression vector refers 
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to a recombinant DNA or RNA construct, such as a plasmid, a phage, recombinant virus 
or other vector, that upon introduction into an appropriate host cell, results in expression 
of the cloned DNA. Appropriate expression vectors are well known to those with 
ordinary skill in the art and include those that are replicable in eukaryotic and/or 
5 prokaryotic cells and those that remain episomal or those which integrate into the host 
cell genome. For example, DNAs encoding DNA binding protein may be inserted into a 
vector suitable for expression of cDNAs in mammalian cells, e.g. a CMV enhancer-based 
vector such as pEVRF (Matthias, et al., (1989) NAR 17, 6418). 

10 Particularly useful for practising the present invention are expression vectors that provide 
for the transient expression of DNA encoding DNA binding protein in mammalian cells. 
Transient expression usually involves the use of an expression vector that is able to 
replicate efficiently in a host cell, such that the host cell accumulates many copies of the 
expression vector, and, in turn, synthesises high levels of DNA binding protein. For the 

15 purposes of the present invention, transient expression systems are useful e.g. for 
identifying DNA binding protein mutants, to identify potential phosphorylation sites, or 
to characterise functional domains of the protein. 

Construction of vectors according to the invention employs conventional ligation 
20 techniques. Isolated piasmids or DNA fragments are cleaved, tailored, and religated in 
the form desired to generate the piasmids required. If desired, analysis to confirm correct 
sequences in the constructed piasmids is performed in a known fashion. Suitable 
methods for constructing expression vectors, preparing in vitro transcripts, introducing 
DNA into host cells, and performing analyses for assessing DNA binding protein 
25 expression and function are known to those skilled in the art. Gene presence, 
amplification and/or expression may be measured in a sample directly, for example, by 
conventional Southern blotting, Northern blotting to quantitate the transcription of 
mRNA, dot blotting (DNA or RNA analysis), or in situ hybridisation, using an 
appropriately labelled probe which may be based on a sequence provided herein. Those 
30 skilled in the an will readily envisage how these methods may be modified, if desired. 
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In accordance with another embodiment of the present invention, there are provided cells 
containing the above-described nucleic acids. Such host cells such as prokaryote, yeast 
and higher eukaryote cells may be used for replicating DNA and producing the DNA 
binding protein. Suitable prokaryotes include eubacteria, such as Gram-negative or 
Gram-positive organisms, such as £. co//, e.g. £. ct?// K- 12 strains, DH5a and HB 101, 
or Bacilli. Further hosts suitable for the DNA binding protein encoding vectors include 
eukaryotic microbes such as filamentous fungi or yeast, e.g. Saccharomyces cerevisiae. 
Higher eukaryotic cells include insect and vertebrate cells, particularly mammalian cells 
including human cells, or nucleated cells from other multicellular organisms. In recent 
years propagation of vertebrate cells in culture (tissue culture) has become a routine 
procedure. Examples of useful mammalian host cell lines are epithelial or fibroblastic 
cell lines such as Chinese hamster ovary (CHO) cells, NIH 3T3 cells, HeLa cells or 293T 
cells. The host cells referred to in this disclosure comprise cells in in vitro culture as well 
as cells that are within a host animal. 

DNA may be stably incorporated into cells or may be transiently expressed using 
methods known in the art. Stably transfected mammalian cells may be prepared by 
transfecting cells with an expression vector having a selectable marker gene, and growing 
the transfected cells under conditions selective for cells expressing the marker gene. To 
prepare transient transfectants, mammalian cells are transfected with a reporter gene to 
monitor transfection efficiency. 

To produce such stably or transiently transfected cells, the cells should be transfected 
with a sufficient amount of the DNA binding protein-encoding nucleic acid to form the 
DNA binding protein. The precise amounts of DNA encoding the DNA binding protein 
may be empirically determined and optimised for a particular cell and assay. 

Host cells are transfected or, preferably, transformed with the above-captioned expression 
or cloning vectors of this invention and cultured in conventional nutrient media modified 
as appropriate for inducing promoters, selecting transformants, or amplifying the genes 
encoding the desired sequences. Heterologous DNA may be introduced into host cells by 
any method known in the art, such as transfection with a vector encoding a heterologous 
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DNA by the calcium phosphate coprecipitation technique or by electroporation. 
Numerous methods of transfection are known to the skilled worker in the field. 
Successful transfection is generally recognised when any indication of the operation of 
this vector occurs in the host cell. Transformation is achieved using standard techniques 
5 appropriate to the particular host cells used. 

Incorporation of cloned DNA into a suitable expression vector, transfection of eukaryotic 
cells with a plasmid vector or a combination of plasmid vectors, each encoding one or 
more distinct genes or with linear DNA, and selection of transfected cells are well known 
10 in the art (see, e.g. Sambrook et aL (1989) Molecular Cloning: A Laboratory Manual, 
Second Edition, Cold Spring Harbor Laboratory Press). 



Transfected or transformed cells are cultured using media and culturing methods known 
in the art, preferably under conditions, whereby the DNA binding protein encoded by the 
15 DNA is expressed. The composition of suitable media is known to those in the art, so 
that they can be readily prepared. Suitable culturing media are also conmiercially 
available. 

DNA binding proteins according to the invention may be employed in a wide variety of 
20 applications, including diagnostics and as research tools. Advantageously, they may be 
employed as diagnostic tools for identifying the presence of modified nucleic acid 
molecules in a complex mixture. DNA binding molecules according to the invention can 
differentiate single base modifications in target DNA molecules. 



25 For example, zinc fingers may be fused to nucleic acid cleavage moieties, such as the 
catalytic domain of a restriction enzyme, to produce a restriction enzyme capable of 
cleaving only methylated DNA (see Kim, et aL, (1996) Proc. Natl. Acad. Sci. USA 
93:1156-1160) . Using such approaches, different zinc finger domains can be used to 
create restriction enzymes with any desired recognition nucleotide sequence, but which 

30 cleave DNA conditionally dependent on the particular modification of the nucleotides, for 
instance methylation of the cytosine ring at position 5. 
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5-meC targeting zinc fingers may moreover be employed in the regulation of gene 
transcription, for example by specific cleavage of methylated (or unmethylated) 
sequences using a fusion polypeptide comprising a zinc finger targeting domain and a 
DNA cleavage domain, or by fusion of an activating domain (such as HSV VP 16) to a 
5 zinc finger, to activate transcription from a gene which possesses the zinc finger binding 
sequence in its upstream sequences. Activation only occurs when the target DNA is 
modified, such as by methylation. Zinc fingers capable of differentiating between U and 
T may be used to preferentially target RNA or DNA, as required. Where RNA-targeting 
polypeptides are intended, these are included in the term "DNA-binding molecule". 

10 

In a preferred embodiment, the zinc finger polypeptides of the invention may be 
employed to detect the presence of a particular base modification in a target nucleic acid 
sequence in a sample. 

15 Accordingly, the invention provides a method for determining the presence of a target 
modified nucleic acid molecule, comprising the steps of: 

a) preparing a DNA binding protein by the method set forth above which is specific for 
the target modified nucleic acid molecule; 
20 b) exposing a test system comprising the target modified nucleic acid molecule to the 
DNA binding protein under conditions which promote binding, and removing any 
DNA binding protein which remains unbound; 
c) detecting the presence of the DNA binding protein in the test system, 

25 In a preferred embodiment, the DNA binding molecules of the invention can be 
incorporated into an ELISA assay. For example, phage displaying the molecules of the 
invention can be used to detect the presence of the target DNA, and visualised using 
enzyme-linked anti-phage antibodies. 

30 Further improvements to the use of zinc finger phage for diagnosis can be made, for 
example, by co-expressing a marker protein fused to the minor coat protein (gVIII) of 
bacteriophage. Since detection with an anti-phage antibody would then be obsolete, the 
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time and cost of each diagnosis would be further reduced. Depending on the 
requirements, suitable markers for display might include the fluorescent proteins (A. B. 
CuhiiU et aLA^995) Trends Biochem ScL 20, 448-455; T. T. Yang, a/., (1996) G^Aie 
173, 19-23), or an enzyme such as alkaline phosphatase which has been previously 

5 displayed on glll ( J. McCafferty, R, H. Jackson, D. J. Chiswell, (1991) Protein 
Engineering 4, 955-961) Labelling different types of diagnostic phage with distinct 
markers would allow multiplex screening of a single DNA sample. Nevertheless, even in 
the absence of such refinements, the basic ELISA technique is reliable, fast, simple and 
particularly inexpensive. Moreover it requires no specialised apparatus, nor does it 

10 employ hazardous reagents such as radioactive isotopes, making it amenable to routine 
use in the clinic. The major advantage of the protocol is that it obviates the requirement 
for gel electrophoresis, and so opens the way to automated DNA diagnosis. 

The invention provides DNA binding proteins which can be engineered with exquisite 
15 specificity. The invention lends itself, therefore, to the design of any molecule of which 
specific DNA binding is required. For example, the proteins according to the invention 
may be employed in the manufacture of chimeric restriction enzymes, in which a nucleic 
acid cleaving domain is fused to a DNA binding domain comprising a zinc finger as 
described herein. 

20 

The invention is described below, for the purpose of illustration only, in the following 
examples. 

25 Example 1 

Preparation and Screening of a Zinc Finger Phage Display Library 

A powerful method of selecting DNA binding proteins is the cloning of peptides (Smith 
(1985) Science 228, 1315-1317), or protein domains (McCafferty et aL, (1990) Nature 
30 348:552-554; Bass et al, (1990) Proteins 8:309-314), as fusions to the minor coat protein 
(pIII) of bacteriophage fd, which leads to their expression on the tip of the capsid. A phage 
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display library is created comprising variants of the middle finger from the DNA binding 
domain of Zif268. 



Materials And Methods 

5 Construction And Cloning Of Genes. In general, procedures and materials are in 
accordance with guidance given in Sambrook et ai. Molecular Cloning. A Laboratory 
Manual, Cold Spring Harbor, 1989. The gene for the Zif268 fingers (residues 333-420) is 
assembled from 8 overlapping synthetic oligonucleotides (see Choo and Klug, (1994) 
PNAS (USA) 91:11163-67), giving Sfil and Notl overhangs. The genes for fingers of the 

10 phage library are synthesised fi-om 4 oligonucleotides by directional end to end ligation 
using 3 short complementary linkers, and amplified by PCR from the single strand using 
forward and backward primers which contain sites for Notl and Sfil respectively. Backward 
PCR primers in addition introduce Met-Ala-Glu as the first three amino acids of the zinc 
finger peptides, and these are followed by the residues of the wild type or library fingers as 

15 required. Cloning overhangs are produced by digestion with 5^1 and Notl where necessary. 
Fragments are ligated to l[ig similarly prepared Fd-Tet-SN vector. This is a derivative of 
fd-tet-DOGl (Hoogenboom et aL, (1991) Nucleic Acids Res. 19, 4133-4137) in which a 
section of the pelB leader and a restriction site for the enzyme Sfil (underlined) have been 
added by site-directed mutagenesis using the oligonucleotide: 

20 

5' CTCCTGCACiTTGGACCTGTGCCAT GGCCGGCTGGGC CGCATAGAATGG 
A AC A ACT A A AGC 3* (Seq ID No. 1) 

which anneals in the region of the polylinker. Electrocompetent DH5a cells are 
25 transformed with recombinant vector in 200ng aliquots, grown for 1 hour in 2xTY medium 
with 1% glucose, and plated on TYE containing 15^g/ml tetracycline and 1% glucose. 

Figure 1 shows the amino acid sequences of the three zinc fingers derived from Zif268 used 
in the phage display library of the present invention. The top and bottom rows represent the 
30 sequence of the first and third fingers respectively. The middle row represents the sequence 
of the middle finger. The randomised positions in the a-helix of the middle finger have 
residues marked 'X'. The amino acid positions are numbered relative to the first helical 
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residue (position 1). For amino acids at positions -1 to +8, excluding the conserved Leu and 
His, codons are equal mixtures of (G,A,C)NN: T in the first base position is omitted in order 
to avoid stop codons, but this has the unfortunate effect that the codons for Trp, Phe, Tyr 
and Cys are not represented. Position +9 is specified by the codon A(G,A)G, allov^ing 
5 either Arg or Lys. Residues of the hydrophobic core are circled, whereas the zinc ligands 
are written as white letters on black circles. The positions forming the P-sheets and the a- 
helix of the zinc fingers are marked below the sequence. 

Phage Selection. Colonies are transfenred from plates to 200ml 2xTY/Zn/Tet (2xTY 

10 containing SOJlIM Zn(CH3COO)2 and IS^ig/ml tetracycline) and grown overnight. Phage 
are purified from the culture supernatant by two rounds of precipitation using 0.2 volumes 
of 20% PEG/2.5M NaCl containing 50|iM Zn(CH3.COO)2, and resuspended in zinc finger 
phage buffer (20mM HEPES pH7.5, 50mM NaCl, ImM MgClo and 50^M 
Zn(CH3.COO)2). Streptavidin-coated paramagnetic beads (Dynal) are washed in zinc 

15 finger phage buffer and blocked for 1 hour at room temperature with the same buffer made 
up to 6% in fat-free dried milk (Marvel). Selection of phage is over three rounds: in the first 
round, beads (1 mg) are saturated with biotinylated oligonucleotide (-80nM) and then 
washed prior to phage binding, but in the second and third rounds 1.7nM oligonucleotide 
and 5^g poly dGC (Sigma) are added to the beads with the phage. Binding reactions 

20 (L5ml) for 1 hour at IS^'C are in zinc finger phage buffer made up to 2% in fat-free dried 
milk (Marvel) and 1% in Tween 20, and typically contained 5x10^^ phage. Beads are 
washed 15 times with 1ml of the same buffer. Phage are eluted by shaking in O.IM 
triethylamine for 5min and neutralised with an equal volume of IM Tris pH7.4. Log phase 
E. coll TGI in 2xTY are infected with eluted phage for 30min at ST^'C and plated as 

25 described above. Phage titres are determined by plating serial dilutions of the infected 
bacteria. 



THIS PAGE BLArSK (usfto) 



wo 99/47656 3q PCT/GB99/00816 

Sequencing Of Selected Phage, Single colonies of transformants obtained after three rounds 
of selection as described, are grown overnight in 2xTY/Zn/Tet. Small aliquots of the 
cultures are stored in 15% glycerol at -20°C, to be used as an archive. Single-stranded 
DNA is prepared from phage in the culture supernatant and sequenced using the 
5 Sequenase™2.0kit(U.S. Biochemical Corp.). 

Example 2 

Isolation of zinc fingers capable of C-T differentiation 

10 The phage are selected against oligonucleotides comprising the sequences GCG GCG GCG 
and GCG GTG GCG. some zinc finger DNA-binding domains are selected which bound 
both sequences equally well (Fig. lb, c). However, two additional zinc finger families 
are isolated which are capable of differential binding to the two closely related sites (Fig. 
lb, c). Sequence-specific recognition requires discrimination of the central base in the 

15 binding site by amino acids in position 3 of the recognition helix of the selected zinc 
fingers, and it is noted that aspartate is selected to bind opposite cytosine in the triplet 
GCG, while alanine is selected opposite thymine in the triplet GTG . The correlation 
between thymine and alanine is particularly significant, as it implies a van der Waals 
interaction between the amino acid side-chain and the 5-methyl group of the base. 

20 Indeed, when thymine is mutated to deoxyuracil in the binding sites of such fingers there 
is a dramatic decrease in the strength of the intermolecular interaction (Fig Ic). This 
shows that these zinc fingers are capable of specifically recognising a 5-methyl group, 
and suggests that similar fingers might be selected which bind 5-meC by the same token. 

25 Example 3 

Selection of 5-methylcytosine-specific zinc fingers 

The phage display library is screened with the synthetic binding site GCGGMGGCG, 
containing a 5-meC base analogue (M). After 5 rounds of selection, zinc finger phage are 
30 tested for binding to 5-meC and cytosine in the context of the above site, and those 
capable of specifically binding the methylated site are sequenced in the region of the zinc 
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finger gene. Two different clones are isolated, which are identical to the DNA-binding 
domains previously selected using the binding site GCGGTGGCG. 

Hence the various zinc finger phage selections described above yield different fingers 
5 able to bind the generic DNA sequence GCGGNGGCG, where N is either thymine, 
cytosine or 5-meC. A full complement of fingers is selected for recognition of the 
cytosine/5-meC pair in the above context, some of which recognise one type of base 
exclusively, while others bound both bases equally well (Figures Ic and 2), 

10 The zinc finger amino acid residues which are selected by the interaction between the 
randomised recognition helix and the central base of the DNA binding site are 
rationalised in terms of previously elucidated zinc finger-DNA recognition rules. Fingers 
with alanine in position +3 of the recognition helix specifically bind 5-meC and thymine 
owing to a tight hydrophobic interaction between the side chain and the 5-methyl group 

15 which is present in both bases. In contrast, a finger with valine in position +3 is also able 
to accommodate cytosine in addition to the two methylated bases, by the use of different 
rotamers. Fingers with aspartate in position +3 bind cytosine specifically, for example by 
forming a ring structure which packs against the pyrimidine as is observed in the refined 
crystal structure of Zif268. 

20 

Example 4 

Selection of 5-meC Specific Zinc Fingers using Cross-Strand Specificity 

1 . General Procedures 

25 Construction of overlapping finger phage display libraries 

Two zinc finger DNA binding domain libraries are constructed comprising the 
amino acid framework of wild-type Zif268, but containing randomisations in amino acid 
positions of fingers 2 and 3. The first library contains randomisations at F2 residue 
position 6 and F3 residue positions -1, 1,2 and 3 and recognises sequences of the form 

30 5*-GXX-XCG-GCG-3\ The second library additionally contains variations in F2 position 
3 and F3 positions 5 and 6 and recognises sequences of the form 5'-XXX-XXG-GCG-3*. 
The libraries are denoted collectively as LF2/3. 
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The genes for the two zinc finger phage display libraries are assembled from synthetic 
DNA oligonucleotides by directional end-to-end ligation using short complementary 
DNA linkers. The oligonucleotides contain selectively randomised codons, encoding all 
5 20 amino acids or a subset thereof, in the appropriate amino acid positions of fingers 2 
and 3. The constructs are amplified by PCR using primers containing Not I and Sfi I 
restriction sites, digested with the above endonucleases to produce cloning overhangs, 
and ligated into vector Fd-Tet-SN. Electrocompetent £. coli TGI cells are transformed 
with the recombinant vector and plated onto TYE medium (1.5% (w/v) agar, 1% (w/v) 
10 Bactotryptone, 0.5% (w/v) Bacto yeast extract, 0.8% (w/v) NaCl) containing 15 mg/ml 
tetracycline. 

Phage selections 

Tetracycline resistant colonies are transferred from plates into 2xTY medium 
15 (16g/litre Bactotryptone, lOg/litre Bacto yeast extract, 5g/litre NaCl) containing 50|aM 
ZnCl2 and 15 |xg/ml tetracycline, and cultured overnight at SO^^C in a shaking incubator. 
Cleared culture supernatant containing phage particles is obtained by centrifuging at 300g 
for 5 minutes. 

20 DNAs of the form 5'-tatagtG-XXXX-GGCGtgtcacagtcagtccacacacgtc-3\ and their 
complementary strands, are chemically synthesised and annealed in 20mM Tris-HCl, pH 
8, lOOmM NaCl. The DNA sequences -XXXX- represent nucleotide sequences after 
methylation by M.Haelll (GGMC) or M.Hhal (GMGC). Since DNA is chemically 
synthesised, the DNA sites used in selections incorporate 5-meC (in appropriate positions 

25 on both strands) with 100% yield. Selections are also carried out on derivatives of these 
sites containing thymine rather than 5-meC in the appropriate positions (and with A 
rather than C on the complementary strand as appropriate). 

One picomole of each target site is bound to streptavidin-coated tubes (Boehringer 
30 Mannheim) in 50^1 PBS containing 50^M ZnCl2. Bacterial culture supernatant 
containing phage is diluted 1:10 in selection buffer (PBS containing SO^lM ZnCl2^ 2% 
(w/v) fat-free dried nnilk (Marvel), 1% (v/v) Tween, 20M.g/ml sonicated salmon sperm 



THIS PAGE BLANK (uspto) 



wo 99/47656 33 PCT/GB99/00816 

DNA), and 1ml is applied to each tube. In order to increase the selection pressure, 50 
pmol soluble (unbiotinylated) competitor sites are synthesised and added to the binding 
mixtures: selections for phage that bind the methylated DNA contain competitors with 
cytosine or thymine at the appropriate positions; selections for phage that discriminate 

5 thymine instead of 5-meC in the recognition sites of the methylase enzymes contain DNA 
competitors with cytosine or 5-meC at the appropriate positions. After 1 hour at 20°C, 
the tubes are emptied and washed 20 times with PBS containing SO^iM ZnCl2, 2% (w/v) 
fat-free dried milk (Marvel) and 1% (v/v) Tween. Retained phage are eluted in 0.1ml 
O.IM triethylamine and neutralised with an equal volume of IM Tris (pH 7.4). 

10 Logarithmic-phase E. coli TGI (0.5ml) are infected with eluted phage (50ml), and 
cultured overnight at 30°C in 2xTY medium containing 50|a.M ZnCl2 and 15 |J.g/ml 
tetracycline, to prepare phage for subsequent rounds of selection. After 4 rounds of 
selection, £. coli TGI infected with selected phage are plated, individual colonies are 
picked and used to prepare phage for ELIS A assays and DNA sequencing. 

15 

ELISA to determine nucleotide discrimination. 

Binding sites are synthesised as described above, including biotinylated sites 
where 5-meC (M) is replaced by a C or T (with appropriate bases in the complementary 
strand). Two-fold dilutions of DNA are added to separate wells of a streptavidin-coated 

20 microtitre plate (Boehringer Mannheim) in 50|il PBS containing 50|LiM ZnCl2 (PBS/Zn). 
Phage solution (bacterial culture supernatant diluted 1:10 in PBS/Zn containing 2% (w/v) 
fat-free dried milk (Marvel), 1% (v/v) Tween and 20fxg/ml sonicated salmon sperm 
DNA) are applied to each well (50^1/well). Binding is allowed to proceed for one hour at 
20**C. Unbound phage are removed by washing 6 times with PBS/Zn containing 1% (v/v) 

25 Tween, then 3 times with PBS/Zn. Bound phage are detected by ELISA using 
horseradish peroxidase-conjugated anti-M13 IgG (Pharmacia Biotech) and the 
colourimetric signal quantitated using SOFTMAX 2.32 (Molecular Devices). 

EUSA using an enzymatically methylated DNA binding site, 
30 Complementary DNA oligonucleotides containing the sequences methylated by 

M.Haelll and M.Hhal are chemically synthesised and annealed as described above. The 
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DNA is used in binding assays without exposure to the methylases, or after reaction with 
either or both methylase enzymes according to the manufacturer's instructions (New 
England Biolabs). DNA binding sites (0.5 pmol) are added to wells of a streptavidin- 
coated microtitre plate (Boehringer Mannheim) in 50\x\ PBS containing 50\xM ZnCl2 
5 (PBS/Zn). The binding of various zinc finger phage clones is assayed by ELISA as 
described above. 

DNA sequence analysis 

The coding sequence of individual zinc finger clones is amplified by PGR using 
10 external primers complementary to phage sequence. These PGR products are then 
sequenced manually using Thermo Sequenase cycle sequencing (Amersham Life 
Science). 

2. Experimental Results 
15 Design of sequence-specific zinc finger proteins which bind enzymatically methylated 
DNA sites. 

The three-finger DNA-binding domain of transcription factor Zif268 binds the 
DNA sequence GCGTGGGGG. Phage display libraries of this zinc finger domain have 
been used to elucidate aspects of the base-recognition mechanism of zinc fingers and to 

20 select fingers which bind to predetermined DNA sequences. We have constructed a set 
of phage display libraries in which amino acid positions from both finger 2 (F2) and 
finger 3 (F3) of Zif268 are simultaneously randomised in order to evaluate the effect of 
inter-finger synergy on the specificity of DNA binding. These libraries, hereafter denoted 
collectively as LF2/3, contain variants which specifically recognised DNA sequences of 

25 the form XXXXCGGCG or GXXXCGGCG, where X is any nucleotide. 

The Haelll and Hhal methyltransferases modify the internal cytosine (shown in bold 
lettering) of their respective DNA recognition sequences GGCC and GCGC. We 
therefore designed two DNA oligos, one containing the sequence GGCC CGGCG and the 
30 other GCGC CGGCG, which included the sites required for modification by the 
respective methylases M.Haelll or M.Hhal (underlined). The oligos also place these 
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sequences in the context of binding sites that could be used to screen LF2/3 for zinc 
fingers that specifically recognise the modified DNA. 



The two different target DNA oligonucleotides are prepared using solid phase DNA 
5 synthesis such that 5-meC is be chemically incorporated into the appropriate positions 
(shown in bold lettering) with 100% yield, and a biotin group is added to the 5' terminus 
of one DNA strand. The synthetically modified DNAs are coupled to a solid support 
coated with streptavidin and used in separate phage selections as described above. After 
four rounds of selection, individual zinc finger clones from either selection are screened 

10 by phage ELISA for binding to the methylated form of their DNA target site and 
discrimination against a control oligo containing the unmodified DNA. Four different 
zinc finger phage clones with varying specificity are selected for further study: (i) clone 
zfHAE(M) preferentially binds the methylated DNA incorporating the Haelll site; (ii) 
clone zfHHA(M) preferentially binds the methylated DNA incorporating the Hhal site; 

15 (iii) clone zfHAE(Y) binds the DNA incorporating the Haelll site regardless of the 
methylation status; and (iv) clone zfHHA(Y) binds the DNA incorporating the Hhal site 
regardless of the methylation status. 

Table 1 shows the sequences of the oligonucleotides used for selection and of the 
20 resulting clones obtained. 
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Table 1 








Oligonucleotide Sequences 




5 


HAE(M) 


5 -tatagtG-GGMC 


-GGCGtgtcacagtcagtccacacacgtc-3' 




HHA(M) 


5 -tatagtC-GMGC 


-GGCGtgtcacagtcagtccacacacgtc-3' 


10 


HAE(Y) 
HHA(Y) 


5'-tatagtG-GGYC 
5*-tatagtG-GYGC 


-GGCGtgtcacagtcagtccacacacgtc-3' 
-GGCGtgtcacagtcagtccacacacgtc-3' 




HAE(T) 


5-tatagtG"GGTC 


-GGCGtgtcacagtcagtccacacacgtc-3' 


15 


wherein: 


M = 5-meC 

Y = pyrimidine (C/T/M) 
R = Purine (A/G) 


20 


Zinc Finger Clones 

Fl 

-112 3 4 5 6 


F2 F3 
-1123456 -1123456 


25 


zfHAE(M) 
2fHHA{M) 
zfHAE(Y) 
zfHHA(Y) 
2fHAE(T) 


R S D E L T R 
R S D E L T R 
R S D E L T R 
R S D E L T R 
R S D E L T R 


RSDDLSQ RKHHRKE 
RSDDLTR YDGARKR 
RSDDLTG HNRDRKR 
RSDHLSA TNSTRTK 
RSDDLST RNDHRKT 



30 



Zinc finger phage binding for each of the above clones is titrated against different 
amounts of methylated and unmethylated DNA oligos to derive values of the apparent 
dissociation constants (Kds) for either DNA binding site (see Figures 4 and 5). The 
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apparent K<i of each clone for the optimally bound DNA site(s) is in the nanomolar range, 
similar to that of wild-type Zif268 DNA-binding domain for its preferred target site using 
this assay. The K^s obtained are shown in Table 2. Clones zfHAE(M) and zfHHA(M) 
preferentially bind their respective DNA target sites when 5-meC is incorporated into the 
5 correct nucleotide positions, and discriminated against the unmethylated DNA sites by 
factors of approximately 20-fold and 5-fold respectively. The discrimination shown by 
zfHAE(M) in particular is good considering the simple DNA recognition mechanism of 
zinc fingers, and that only a single functional group per DNA molecule has been altered. 
Clones zfHAE(Y) and zfHHA(Y) bind their respective target sites but do not show any 
10 preference for either the modified or unmodified forms. 

The four zinc finger clones isolated by phage display using synthetic 5-meC -containing 
DNA target sites are next tested for binding to enzymatically methylated DNA. In this 
assay a single DNA fragment is used that incorporates both the GGCC CGGCG and the 

15 GCGC CGGCG zinc finger binding site sequences (Figure 6a), which additionally are 
substrates for methylation by M.Haelll and M.Hhal respectively. Each zinc finger clone 
is tested for binding to the DNA before and after DNA modification using one or both 
methylases. Figure 6b shows that, in contrast to zfHAE(Y) and zfHHA(Y) which both 
recognise the DNA regardless of the methylation status (as would be expected), 

20 zfHAE(M) and zfHHA(M) bind only after specific methylation of the DNA by the 
appropriate methylase enzyme. Thus enzymatic modification of cytosine to 5-meC can 
act as a switch that induces specific protein-DNA complex formation. 
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Clone 


Oligonucleotide 






zfHAE (M) 


G-GGMC-GGCG 


2.0+/ 


- 0.2nM 




G-GGCC-GGCG 


62 +/- 


29nM 


zfHHA(M) 


G-GMGC-GGCG 


14 +/- 


3 .2nM 




G-GCGC-GGCG 


62 +/- 


22nM 


zfHAE (Y) 


G-GGMC-GGCG 


6.3 +/ 


- 1.4riM 




G-GGCC-GGCG 


2.0+/- 


0 . 2nM 


zfHHA(Y) 


G-GMGC-GGCG 


14 +/- 


2 . OnM 




G-GCGC-GGCG 


11 +/- 


2 . 4nM 



15 

Synergistic zinc finger pairs that discriminate 5-methylcytosine from thymine. 

The 5 -methyl group of methylcytosine and thymine is a prominent feature of the 
DNA major groove which contributes important intermolecular (hydrophobic) contacts in 
protein-DNA interactions but is stereochemically indistinguishable in the two different 

20 bases. Consequently, zinc fingers - which frequently achieve DNA recognition by 1:1 
contacts between amino acids and bases - often fail to discriminate between the two 
closely related bases. The phage-selected clone zfHHA(M) is one such zinc finger 
protein which accepts both thymine and 5-meC with almost equal affinity (Figure 5). In 
this case it is likely that the aromatic ring of tyrosine forms equally good hydrophobic 

25 contacts with the methyl group of either base. 

One way in which zinc finger proteins could distinguish 5-meC from thymine is to 
discriminate the complementary nucleotide in the base-pair. Zinc finger proteins such as 
Zif268 make base contacts predominantly to only one DNA strand - the 'antiparallel' 
30 strand - but, importantly, they can also form 'cross-strand' contacts to certain bases on the 
complementary, 'parallel' strand. It has been shown that these contacts can make 
important contributions to DNA-binding specificity. Thus the zinc fingers of Zif268 and 
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related proteins can be regarded as binding to overlapping 4bp subsites, where the 
specificity for the base-pair at the boundary between adjacent subsites potentially arises 
via contacts from two synergistic zinc fingers to each of the nucleotides in the base-pair 
(Figure 3). Therefore a zinc finger protein can distinguish a 5-meC:G base-pair from a 
5 T:A base-pair provided they are positioned at the overlap between adjacent DNA 
subsites, such that a contact to the 'parallel' strand can be made. 

This is the case for the DNA binding site GGMCCGGCG in which the 5-meC base (bold) 
is discriminated from thymine by zinc finger clone zfHAE(M). According to the 

10 conventional model of zinc fmger-DNA recognition, based on the crystal structure of the 
Zif268-DNA complex and subsequent biochemical experiments, the 5-meC base in the 
binding site is contacted by the glutamine residue in a-helical position +6 of finger 2 
(Figure 3). Additionally, the complementary guanine can be recognised using a 
synergistic contact from the histidine residue in a-helical position +2 of finger 3 (Figure 

15 3). 

In order to investigate further the discrimination between 5-meC and thymine, another 
zinc finger clone is selected, zfHAE(T), which is specific for thymine instead of 5-meC in 
the context of the above binding site. This clone makes use of a cross-strand contact 

20 from aspartate in position +2 of finger 3 to recognise adenine in the 'parallel' strand. In 
this respect zfHAE(T) is remarkably like the wild-type Zif268 DNA-binding domain, 
whose zinc fingers each have an Arg-Ser-Asp triad that makes inter- and intra- molecular 
contacts including cross-strand contacts from the aspartate. Discrimination in favour of 
thymine by zfHAE(T) is relatively stronger than discrimination for 5-meC by zfHAE(M), 

25 presumably owing to the stabilising effect of intramolecular (protein-protein buttressing) 
interactions and the favourable geometry of this network of contacts. 

The dissociation constants for the interactions seen between zfHAE(M), zfHHA(M) and 
zfHAE(T) and 5-meC or T oligonucleotides are set forth in Table 3. 

30 
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Table 3 

KjS of each clone for 5-meC and T oligonucleotides 



Clone Oligonucleotide 



d 



zfHAE(M) G-GGMC-GGCG 2.0 +/- 0.2nM 

G-GGTC-GGCG 2 7 +/- 4 . 4nM 

zfHHA(M) G-GMGC-GGCG 14 +/- 3 . 2nM 

G-GTGC-GGCG 6.1 + / - 4 . 5nM 

zfHAE(T) G-GGMC-GGCG 3.4 +/- 0 . 5nM 

G-GGTC-GGCG n/a 



15 Example 5 

Methylcytosine-specific restriction enzyme 

Phage-selected or rationally designed zinc finger domains which recognise modified 
bases, including 5-meC, can be converted to restriction enzymes which cleave DNA 
20 containing those modified bases, including 5-meC. This is achieved by coupUng a 
modified base-specific zinc finger to a cleavage domain of a restriction enzyme or other 
nucleic acid cleaving moiety. 

A method of converting zinc finger DNA-binding domains to chimaeric restriction 
25 endonucleases has been described in Kim, et al, (1996) Proc. Natl. Acad. Sci. USA 
93:1156-1160. In order to demonstrate the applicability of methylcytosine-specific zinc 
fingers to restriction enzymes, a fusion is made between the catalytic domain of Fok I as 
described by Kim et al and the 5-meC specific zinc finger described in Example 3. 
Fusions of the 5-meC zinc finger nucleic acid-binding domain to the catalytic domain of 
30 Fok I restriction enzyme results in a novel endonuclease which cleaves DNA adjacent to 
the DNA recognition sequence of the zinc finger, namely GCGGMGGCG. 
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The oligonucleotides GCGGMGGCG and GCGGCGGCG are synthesised and ligated to 
random DNA sequences. After incubation with the zinc finger restriction enzyme, the 
nucleic acids are analysed by gel electrophoresis. Bands indicating cleavage of the 
nucleic acid at a position corresponding to the location of the oligonucleotide 
5 GCGGMGGCG are visible with the methylated, but not the unmethylated, nucleic acid. 

In a further experiment, the 5-meC-specific zinc finger is fused to an amino terminal 
copper/nickel binding motif. Under the correct redox conditions (Nagaoka, M., et al, 
(1994) J. Am. Chem. Soc. 116:4085-4086), sequence-specific DNA cleavage is 
10 observed, only in the presence of 5-meC containing DNA incorporating the 
oligonucleotide GCGGMGGCG. 

Example 6 

Determination of methylase activity in vivo 

15 

A reporter systems is produced which produces a reporter signal conditionally depending 
on the activity of a DNA methylase. 

A transient transfection system using zinc finger transcription factors is produced as 
20 described in Choo, Y., et al, (1997) J. Mol. Biol 273:525-532. This system comprises 
an expression plasmid which produces a 5-meC specific phage-selected zinc finger fused 
to the activation domain of HSV VP 16, and a reporter plasmid which contain the 
recognition sequence of the zinc finger upstream of a CAT reporter gene, 

25 Thus, a zinc finger which recognises the DNA sequence GCGGCCGCG selected by 
phage display as described in Choo, Y. & Klug, A. (1994) Proc. Natl. Acad. Sci. 
U.S.A. 91: 1 1 163-1 1 167. By the method of the preceding examples, a further zinc finger 
is selected which is capable of binding to the sequence GCGGMGGCG where the central 
base M is 5-meC, and used to construct transcription factors as described in the 

30 foregoing. 
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A transient expression experiment is conducted, wherein the CAT reporter gene on the 
reporter plasmid is placed downstream of the sequence GCGGCCGCG. The reporter 
plasmid is cotransfected with a plasmid vector expressing the zinc fmger-HSV fusion 
under the control of a constitutive promoter. No activation of CAT gene expression is 
5 observed. 

However, when the same experiment is conducted in the presence of Hae III methylase, 
CAT expression is observed as a result of the methylation of GCGGCCGCG to form 
GCGGMCGCG, and consequent binding of the zinc finger transcription factor to its 
10 recognition sequence. 
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L A zinc finger polypeptide which binds to a target DNA sequence containing a 
modified base but not to an identical sequence containing the equivalent unmodified base. 

2. A polypeptide according to claim 1, wherein the target DNA sequence comprises 
a triplet having 5-meC at the central position, and binding to the 5-meC residue by an a- 
helical zinc finger binding motif in the polypeptide is achieved by placing an Ala residue 
at position +3 of the a-helix. 

3. A method for preparing a DNA binding polypeptide of the Cys2-His2 zinc finger 
class capable of binding to a DNA triplet in target DNA sequence comprising 5-meC as 
the central residue in the target DNA triplet, wherein binding to the 5-meC residue by an 
a-helical zinc finger DNA binding motif of the polypeptide is achieved by placing an Ala 
residue at position +3 of the a-helix of the zinc finger. 

4. A method for preparing a DNA binding polypeptide of the Cys2-His2 zinc finger 
class capable of binding to a DNA triplet in target DNA sequence comprising 5-meC, but 
not to an identical triplet comprising unmethylated C, wherein binding to each base of the 
triplet by an a-helical zinc finger DNA binding motif in the polypeptide is determined as 
follows: 

a) if the 5' base in the triplet is G, then position +6 in the a-helix is Arg and/or position 
++2 is Asp; 

b) if the 5' base in the triplet is A, then position +6 in the a-helix is Gin or Glu and ++2 
is not Asp; 

c) if the 5' base in the triplet is T, then position +6 in the a-helix is Ser or Thr and 
position ++2 is Asp; or position +6 is a hydrophobic amino acid other than Ala; 

d) if the 5' base in the triplet is C, then position +6 in the a-helix may be any amino acid, 
provided that position ++2 in the a-helix is not Asp; 

e) if the central base in the triplet is G, then position +3 in the a-helix is His; 

f) if the central base in the triplet is A, then position +3 in the a-helix is Asn; 
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g) if the central base in the triplet is T, then position +3 in the a-helix is Ala, Ser, He, 
Leu, Thr or Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small 
residue; 

h) if the central base in the triplet is 5-meC, then position +3 in the a-helix is Ala, Ser, 
5 He, Leu, Thr or Val; provided that if it is Ala, then one of the residues at -1 or +6 is a 

small residue; 

i) if the 3' base in the triplet is G, then position -1 in the a-helix is Arg; 

j) if the 3' base in the triplet is A, then position -1 in the a-helix is Gin and position +2 is 
Ala; 

10 k) if the 3' base in the triplet is T, then position -1 in the a-helix is Asn; or position -1 is 
Gin and position +2 is Ser; 

1) if the 3' base in the triplet is C, then position -1 in the a-helix is Asp and Position +1 
is Arg. 

15 5. A method for producing a zinc finger polypeptide capable of binding to a DNA 
sequence comprising a modified residue, but not to an identical sequence comprising an 
equivalent unmodified residue, comprising the steps of: 

a) providing a DNA library encoding a repertoire of zinc finger polypeptides, the 
20 DNA members of the library being at least partially randomised at one or more of the 

positions encoding residues -1, 2, 3 and 6 of an a-helical zinc finger binding motif of the 
zinc finger polypeptides; 

b) displaying the library in a selection system and screening it against a target DNA 
25 sequence comprising the modified residue; 

c) isolating the DNA members of the library encoding zinc finger polypeptides 
capable of binding to the target sequence; and 



30 



d) optionally, verifying that the zinc finger polypeptides do not bind significantly to 
a DNA sequence identical to the target DNA sequence but containing the equivalent 
unmodified residue in place of the modified residue. 
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6. A method according to claim 5, wherein the nucleic acid library encodes a 
repertoire of zinc finger polypeptides each possessing more than one zinc fingers, the 
nucleic acid members of the library being at least partially randomised at one or more of 
the positions encoding residues -1, 2, 3 and 6 of the a-helix in a zinc finger and at one or 
more of the positions encoding residues -1, 2, 3 and 6 of the a-helix in a further zinc 
finger of the zinc finger polypeptides. 

7. A method according to claim 5 or claim 6, wherein the modified residue is 5-meC 
and the unmodified residue is C. 

8. A method according to claim 5 or claim 6, wherein the modified residue is U and 
the unmodified residue is T. 

9. A method according to any one of claims 5 to 8, wherein the library is screened by 
phage display. 

10. A method according to any one of claims 5 to 9, wherein the or each zinc finger 
has the general primary structure 

(A) X' C X,., C X2.3 FX'^XXXXLXXHXXX^'H - linker 

-1 123456789 

wherein X (including X^, X*' and X"") is any amino acid. 

11. A method according to claim 10 wherein X"" is ^/y-X or P- ^/y-X. 

12. A method according to claim 10 or claim 1 1 wherein X2^ is selected from any one 
of: S-X, E-X, K-X, T-X, P-X and R-X. 

13. A method according to any one of claims 10 to 12 wherein X*' is T or I. 
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14. A method according to any one of claims 10 to 13 wherein X2.3 is G-K-A, G-K-C, 
G-K-S, G-K-G, M-R-N or M-R. 

15. A method according to any one of claims 10 to 14 wherein the linker is T-G-E-K 
5 or T-G-E-K-P. 

16. A method according to any one of claims 10 to 15 wherein position +9 is R or K. 

17. A method according to any one of claims 10 to 16 wherein positions +1, +5 and 
10 +8 are not occupied by any one of the hydrophobic annino acids, F, W or Y. 

18. A method according to claim 17 wherein positions +1, +5 and +8 are occupied by 
the residues K, T and Q respectively. 

15 19. A method for preparing a DNA binding polypeptide of the Cys2-His2 zinc finger 
class capable of binding to a DNA triplet in target DNA sequence comprising 5-meC, but 
not to an identical triplet comprising unmethylated C: 

a) selecting a model zinc finger domain from the group consisting of naturally 
20 occurring zinc fingers and consensus zinc fingers; and 

b) mutating the finger by the method of any one of claims 3 to 17. 

20. A method according to claim 19, wherein the model zinc finger is a consensus 
25 zinc finger whose structure is selected from the group consisting of the consensus 

structure PYKCPECGKSFSQKSDLVKHQRTHTG, andthe consensus 
structure PYKCSECGKAFSQKSNLTRHQRIHTGEKP. 

21. A method according to claim 19 wherein the model zinc finger is a naturally 
30 occurring zinc finger whose structure is selected from one finger of a protein selected 

from the group consisting of Zif 268 (Elrod-Erickson et al, (1996) Structure 4:1171- 
1180), GLI (Pavletich and Pabo, (1993) Science 261:1701-1707), Tramtrack (Fairall et 
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a/., (1993) Nature 366:483-487) and YYl (Houbaviy et ai. (1996) PNAS (USA) 
93:13577-13582). 

22. A method according to claim 21 wherein the model zinc finger is finger 2 of Zif 
5 268, 

23. A method according to any one of claims 3 to 22 wherein the binding protein 
comprises two or more zinc finger binding motifs, placed N-terminus to C-terminus. 

10 24. A method according to claim 22, wherein the N-terminal zinc finger is preceded 
by a leader peptide having the sequence MAEEKP. 

25. A method according to claim 23 or claim 24, wherein the DNA binding protein is 
constructed by recombinant DNA technology, the method comprising the steps of: 

15 

a) preparing a DNA coding sequence encoding two or more zinc finger binding 
preparable according to claim 23 or 24, placed N-tenninus to C-terminus; 

b) inserting the DNA sequence into a suitable expression vector; and 

c) expressing the DNA sequence in a host organism in order to obtain the DNA binding 
20 protein. 

26. A method according to one of claims 3 to 25 comprising the additional steps of 
subjecting the DNA binding protein to one or more rounds of randomisation and selection 
in order to improve the characteristics thereof. 

25 

27. A zinc finger polypeptide which binds to a target DNA sequence containing a 
modified base but not to an identical sequence containing the equivalent unmodified base, 
preparable by a method according to any one of claims 3 to 26. 
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(i) G C G G M G G C G 

-1 123456789 
R A D®L M V H K R 
R G D@L T S H E R 

(ii) G C G G T G G C G 

-1 123456789 

R A D®L M V H K R 

R G D®L T S H E R 

R V D®L E A H R R 

R E D®L I R H G K 

(iii) G C G G C G G C G 

-1 1 23456789 
R G P®L A R H G R 
R E D(2)L I R H G K 

FIG. IB 
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F3 



4/7 
F2 



zf Hae (M) 



I 



6 32 -1 
EKRHHKR 



6 3 
QSLD 



5' - t at agt G 
a t a t caC 



G G 



MCI GGCGt gtcacagtcagtccacacgtc 
M^fGrG CCGCacag t g t cag t cagg t g t gcag 



F3 



F2 



2f Hha (M) 



6 32 -1 
RKRAGDY 



6 3 
RTLD 



t at agt G 
a t a t caC 



G G Y C GGCGt g t cacag t cag t ccacacgt c - 3 



C M R G CCGCa cagtgtcagtcaggtgtgcag 



F3 F2 

6 32 -1 6 3 
zfHae(Y) RKRDRNH GTLD 

5 - tat agt G G G\.Y CiGGCGt g t cacag t cag t ccacacg t c - 3' 



atatcaC C M |R| G CCGCacag t gtcagtcaggtgtgcag 

F3 F2 

6 32 -1 6 3 
2fHha(Y) KTRTSNT ASLH 



s'-tatagtG G Y G C GGCGt g t cacag t cag t ccacacg t c - 3' 



atatcaC CRM G CCGCacag t g t cag t cagg t g t gcag 

F3 F2 

6 3 2-16 3 
zfHae(T) TKRHDNR TSLD 



5 -tatagtG G G\T CjGGCGt g t c ac ag t c ag t ccacac g t c - 3 



atatcaC C M[AjG CCGCacag tgtcagtcaggtgtgcag 
M=5-methylcytosine Y=pyrimidine (C/T/M) R=purine (A/G) 

FIG. 3 
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